AlexTISYoung / snipar

Imputation of parental genotypes, inference of sibling IBD segments, family based GWAS, and polygenic score analyses.
MIT License
23 stars 4 forks source link

Runtime estimates for imputation step? #7

Closed ccrobertson closed 4 years ago

ccrobertson commented 4 years ago

Hi there,

I'm running the impute_runner.py step on a toy data set with just 4 sibpairs (no parents) genotyped on an array with ~450k variants.

Attached is my log file. The "create pedigree" and "prepare_data" functions seem to be executed with no problems, but the impute() function runs for a very long time (eventually timed out after 12 hours). There are a few pandas warnings (e.g. "SettingWithCopyWarning"), but it's not clear if these are actually problematic.

I'm wondering if you have an idea of how long I should expect this to take? And if you think the pandas warnings are a problem, that would be great to know as well.

Thanks for the help! Cassie run_snipar_t1dgc_test.log

MoeenNehzati commented 4 years ago

Hi, Sorry for the delay, I missed the issue. Although 450,000 SNPs is a lot, It's just four pairs so it shouldn't take an hour let alone more that 12 hours. First weird thing is that the package thinks it's running the imputation for chromosome 1 to 26. Could you show me the command you have used? If you have specified chromosomes with wildcards, do any of the data files match with the wild card with numbers 23-24-25-26 in them? I don't think that the warnings are relevant to the issue you are seeing, you see them because the code is not using one of pandas best practices. Another thing that is worth a try is running the imputation with --start 100 --end 200 option. This restricts the imputation to 100 SNPs. This imputation should be done in seconds so it might tell us what's going on.

MoeenNehzati commented 3 years ago

Merge IBD branch with master