dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
142 stars 37 forks source link

The inconsistent output of GLnexus, labeling no call to '0/0' #263

Closed sujunhao closed 3 years ago

sujunhao commented 3 years ago

Hi,

I found there are inconsistent cases of GLnexus output. I run GLnexus to merge the DeepVariant outputs, but I found a portion of genotypes not called in a single sample is mislabeled as '0/0' in GLnuex output. And the results of GLnexus seem inconsistent when I try to merge VCFs that only contain a single position. For example, for the 3 VCFs of DeepVariants called, when I tried to merge all three original gVCFs, the results shown in chr20 position 1258820, the genotype of HG004 is 0/0, and the Approximate read depth is 0, which is wrong. But then I tried to extract variants at chr20 position 1258820 from each gVCFs to three single records gVCFs in the first place and then feed to GLnexus. GLnexus output the genotype of HG004 as "./.", which is right.

I attached the command I used and files (GLnexus_results.zip) for your reference:

Thanks, JH

mlin commented 3 years ago

I appreciate your efforts to pull together the example files, however I still had some trouble understanding the complete scenario.

HG004_sng.vcf is an empty file (after the header). Naturally GLnexus fills in ./. as there's no data for it to work with.

HG004.g.vcf has this reference band containing position 1258820:

chr20   1258807 .   A   <*> 0   .   END=1258875 GT:GQ:MIN_DP:PL 0/0:1:0:0,3,29

which is where GLnexus sources the 0/0 call from, although its support is poor (as indicated by the GQ).

sujunhao commented 3 years ago

Many thanks for your reply.

I found that the GLnexus results should be correct in my case. I missed the representation of the reference band from 1258807 to1258875. It makes sense to me now.

Best, JH