brentp / somalier

fast sample-swap and relatedness checks on BAMs/CRAMs/VCFs/GVCFs... "like damn that is one smart wine guy"
MIT License
262 stars 35 forks source link

Feature request: option to ignore filters #33

Closed holtjma closed 4 years ago

holtjma commented 4 years ago

Hello, I was testing on some files (gVCFs again) and I noticed I wasn't capturing all of the regions in a small custom-made sites file. After tinkering a bit, I ended up creating a gVCF file where all the variants were labeled "PASS" in the filter and that allowed somalier to correctly extract the missing variants. Of course, this is understandably the default behavior but I'm wondering if we can get an option that will allow somalier to extract those variants even if they are not labeled as "PASS".

holtjma commented 4 years ago

Relatedly, is there a description of the minimum requirements for a variant to be extracted? All I could find was the AD check defaulting to GT if AD isn't available.

brentp commented 4 years ago

i will document the requirements. briefly it requires the FILTER is one of "PASS", "", ".", "RefCall" then, when genotyping, it uses allele-balance AB. with HOM-REF < 0.04, HOM-ALT > 0.96, HET 0.2 to 0.8 and all else is unknown.

what is the FILTER of the variants you'd like to include? I'd rather add more things like "RefCall" than add extra flags.

holtjma commented 4 years ago

To be honest, it's a lot of weird filters that shouldn't pass under normal conditions (things like LowGQX;NoPassedVariantGTs), but due to the way this test captures the sequencing it throws off the filters even when the variant call is good.

Sounds like my best path forward would be to just pre-parse the gVCFs such that somalier will accept the variants (i.e. reduce it down to PASS and a GT field only).

brentp commented 4 years ago

I'll make this an environment variable, something like SOMALIER_ALLOWED_FILTERS in a future release, for now, yes, you can use some simple awk to pre-process.

brentp commented 4 years ago

this is now in master and i'll make a new release shortly you can do, e.g.

export SOMALIER_ALLOWED_FILTERS=LowGQX,NoPassedVariantGTs
somalier extract ...