Santy-8128 / Minimac3

Minimac3 is a low memory and computationally efficient implementation of the genotype imputation algorithms. Minimac3 is designed to handle very large reference panels in a more computationally efficient way with no loss of accuracy.
28 stars 12 forks source link

VCF header and sample order issues #3

Closed dat4git closed 7 years ago

dat4git commented 7 years ago

There appear to be at least two problems with the vcf output generated by Minimac3-omp.

  1. [W::vcf_parse] FILTER 'GENOTYPED' is not defined in the header
  2. Different order of sample names in vcf (incompatible with concat tools)

Any help with these issues would be much appreciated.

Thanks!

Santy-8128 commented 7 years ago

Hi, Yes, we have been previously notified of these issues. We are working on fixes for them in minimac4, which we will release within 2 months. We don't want to make any more releases for minimac3 and want to make all changes to minimac4 from now on. Thanks.

Santy-8128 commented 7 years ago

Please email me at sayantan@umich.edu for alternative solutions in the meantime.

freeseek commented 7 years ago

It would be enough to add the following at line 428 of file Imputation.cpp: ifprintf(vcfdosepartial,"##FILTER=<ID=GENOTYPED>");

Or alternatively using the following patch file:

@@ -425,6 +425,7 @@
         ifprintf(vcfdosepartial,"##filedate=%d.%d.%d\n",(now->tm_year + 1900),(now->tm_mon + 1) ,now->tm_mday);
         ifprintf(vcfdosepartial,"##source=Minimac3\n");
         ifprintf(vcfdosepartial,"##contig=<ID=%s>\n",rHap.finChromosome.c_str());
+   ifprintf(vcfdosepartial,"##FILTER=<ID=GENOTYPED>");

         if(GT)
                 ifprintf(vcfdosepartial,"##FORMAT=<ID=GT,Number=1,Type=String,Description=\"Genotype\">\n");
Santy-8128 commented 7 years ago

Done, Can you check it once ?

cariaso commented 7 years ago

The fix is now being seen in the wild, but produces a syntactically invalid VCF. Both

https://github.com/jamescasbon/PyVCF (which promethease.com depends upon) and https://github.com/vcftools/vcftools (another industry standard parser)

both reject these two lines

FILTER=

FILTER=

with error messages like Syntax Error in your file: One of the FILTER lines is malformed: ##FILTER=

in files that I've seen from independent sources.