frankvogt / vcf2gwas

Python API for comprehensive GWAS analysis using GEMMA
GNU General Public License v3.0
84 stars 29 forks source link

Duplicate ID 's00007:1500001' generated by --set-missing-var-ids. #25

Closed liwang0307 closed 1 year ago

liwang0307 commented 1 year ago

Hi,

Thank you so much for developing this great tool! It makes our research easier! I have tried to run it recently, however, it gave me this error message: Error: Duplicate ID 's00007:1500001' generated by --set-missing-var-ids. It stopped when trying to convert vcf to plink bed file: subprocess.CalledProcessError: Command '['plink', '--vcf', '_vcf2gwas_temp_20230408_143339/mod_sub_Flupicolide_sensitivity.part1_snaps_all_v3.vcf.gz', '--make-bed', '--out', '_vcf2gwas_temp_20230408_143339/mod_sub_Flupicolide_sensitivity.part1_snaps_all_v3', '--mind', '1', '--set-missing-var-ids', '@:#', '--allow-extra-chr', '--memory', '64000', '--threads', '16']' returned non-zero exit status 5.

Would you happen to have any recommendations for me to fix this bug? Appreciate your support!

Thanks,

Li

frankvogt commented 1 year ago

The issue should be fixed now in vcf2gwas v0.8.9. Internally I changed the pipeline to use plink2 instead of plink1 for creating the bed files, which has better handling of duplicate IDs generated by missing IDs.