dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
137 stars 36 forks source link

GLnexus CLI generates VCFs missing GATK-compatible INFO fields #275

Open bh007 opened 2 years ago

bh007 commented 2 years ago

In an attempt to joint-call a small cohort of 55 WGS samples, I used

glnexus_cli --config gatk --mem-gbytes 900 /samples/*.g.vcf.gz > out.bcf

The output VCF (in following "bcftools view" and "bgzip") was successfully created in a short time. Comparing to parallel outcomes from other joint-calling processes using GATK pipelines, however, I found that many key-value pairs are missing from the GLnexus results, e.g.

GLnexus -

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ...

1 10146 1_10146_AC_A AC A 181 . AF=0.009091;AQ=181 GT:DP:AD:SB:GQ:PL:RNC ...

GATK -

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT

chr1 10140 . ACCCTAAC A 189.78 . AC=1;AF=0.022;AN=46;AS_BaseQRankSum=0.900;AS_FS=2.632;AS_InbreedingCoeff=-0.0305;AS_MQ=47.10;AS_MQRankSum=3.000;AS_QD=15.83;AS_ReadPosRankSum=2.200;AS_SOR=0.916;BaseQRankSum=1.17;DP=1908;ExcessHet=3.0103;FS=5.756;InbreedingCoeff=-0.0305;MLEAC=1;MLEAF=0.022;MQ=39.19;MQRankSum=2.24;QD=15.82;ReadPosRankSum=1.80;SOR=1.198 GT:AD:DP:GQ:PL

I went though the CLI options and found none would permit extra meta-data to be added in the calling process. Is this GATK-incompatibility a feature in design? If not, would it be possible to sort of "turn on" such extra output in VCF generation?

Thanks.

GRGong commented 1 year ago

Same problem, but for senteion gvcfs.

zhoudreames commented 1 year ago

I have the same problem, the FORMAT only includes AF and AQ. I need more detailed information for the FORMAT column, how to add this, Thanks~ @bh007 @GRGong @geetduggal