dnanexus-rnd / GLnexus

Scalable gVCF merging and joint variant calling for population sequencing projects
Apache License 2.0
149 stars 38 forks source link

How to filter out unknown variants? #211

Closed ghost closed 4 years ago

ghost commented 4 years ago

Hello, is there a built-in ability to filter out unkown genotypes?

thank you

EDIT: more precisely, for instance

chromosome_6 3581170 chromosome_6_3581170_G_A G A 33 . AF=0.041667;AQ=33;AC=1;AN=2 GT:DP:AD:GQ:PL:RNC ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM ./.:.:.:.:0,0,0:MM 0/1:201:104,97:34:33,0,53:..

This looks like the variant only exist in the last individual. However, as others have just the "missing information", I would regard this as a false positive. Is there a way to get rid of those false positive or it must be dealt with by the user on a per case basis? (I understand that making it automatic would be a challenge, for example what to do if the variant is genotyped in half of the individuals and is "unknown" in the other half, etc ...)

ghost commented 4 years ago

Found the solution with bcftools.