dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
142 stars 37 forks source link

missing recorder from GLnexus result #217

Closed hurleyLi closed 4 years ago

hurleyLi commented 4 years ago

Hi,

I came across an issue where an INDEL variant is present in an individual gvcf file (GT = 0/1), but is missing from the GLNexus call. The job finished successfully, and most of the variants are fine.

I am using an older version of the tool (v1.1.3-0-g74f4279) because I don't have permission to upgrade the tool on our cluster. Also my gvcf files are generated using xAtlas.

I'll try to upgrade the tool and see whether it can fix the problem. But I'm wondering if you've seen this problem before or know any version of the tool specifically fixed this problem.

Thanks in advance! Hurley

hurleyLi commented 4 years ago

I noticed that all of the missing records have low genotyping quality in the individual gvcf files. I wondered if there is any threshold set by GLnexus to remove such variants by default, but I don't see any options that I can specify any threshold.

mlin commented 4 years ago

Yes, there are some default quality filters on by default, discussed here: https://github.com/dnanexus-rnd/GLnexus/wiki/Configuration#unifier-configuration

Usually this is desirable for large projects because without any quality filters, every single false positive variant in every sample triggers an entire spurious row of the output project VCF, causing it to blow up in size.

The exact settings of those configuration options are logged in GLnexus' standard error output, and more recent versions also embed them in the output VCF header.

More recent versions do also include a weCall_unfiltered configuration preset which turns off the quality filters, but again I'd only advise using that on small-ish projects.

hurleyLi commented 4 years ago

Thanks for the clarification and suggestion @mlin !