GATK VariantFiltration on the FORMAT (genotype) field with
--genotype-filter-expression "DP < ${params.min_depth}" --genotype-filter-name "DP_min_depth" will only add a FT field with PASS or DP_min_depth for variant positions that has at least 1 strain failed the filter. See here: https://gatk.broadinstitute.org/hc/en-us/articles/360035891191-VariantFiltration-FT-tag
In other words, for variant positions that all strains pass the filter, there will not be a PASS added. This creates a challenge for bcftools filter -O u --set-GTs . --exclude 'FORMAT/FT != "PASS"' in the hard filter step.
So I added gatk SelectVariants --set-filtered-gt-to-nocall to change genotype in failed positions to ./. in
https://gatk.broadinstitute.org/hc/en-us/articles/360035531012--How-to-Filter-on-genotype-using-VariantFiltration
Note this step should happen before filtering for high_missing.
ded49b8
bcftools softfilter with --mode + will append new filter names to the FILTER field for sites that fail, with --mode x will reset FILTER field to PASS for sites that pass this current filter. http://samtools.github.io/bcftools/bcftools.html#filter
Changed --mode +x to --mode + for high_missing and high_heterozogysity
dbbdcdc
I'll merge pull request to wrap up the intended changes. And put the unresolved suggestions to the project to-do list. Thanks for reviewing and the comments!!
Here are the updates:
GATK VariantFiltration on the FORMAT (genotype) field with
--genotype-filter-expression "DP < ${params.min_depth}" --genotype-filter-name "DP_min_depth"
will only add a FT field withPASS
orDP_min_depth
for variant positions that has at least 1 strain failed the filter. See here: https://gatk.broadinstitute.org/hc/en-us/articles/360035891191-VariantFiltration-FT-tag In other words, for variant positions that all strains pass the filter, there will not be aPASS
added. This creates a challenge forbcftools filter -O u --set-GTs . --exclude 'FORMAT/FT != "PASS"'
in the hard filter step. So I addedgatk SelectVariants --set-filtered-gt-to-nocall
to change genotype in failed positions to./.
in https://gatk.broadinstitute.org/hc/en-us/articles/360035531012--How-to-Filter-on-genotype-using-VariantFiltration Note this step should happen before filtering forhigh_missing
.ded49b8
bcftools softfilter with
--mode +
will append new filter names to the FILTER field for sites that fail, with--mode x
will reset FILTER field to PASS for sites that pass this current filter. http://samtools.github.io/bcftools/bcftools.html#filter Changed--mode +x
to--mode +
for high_missing and high_heterozogysitydbbdcdc
Removed
--max-genotype-count 3000 --max-alternate-alleles 100
from single samplegatk HaplotypeCaller
step. May consider adding this togatk GenotypeGVCFs
step but increasing these values will increase computation cost exponentially https://gatk.broadinstitute.org/hc/en-us/articles/360036348452-GenotypeGVCFs#--max-alternate-alleles9a99b43
Modify outputs associated with making the report:
-s-
tobcftools stats
to write out per sample stats'%QUAL\t%INFO/QD\t%INFO/SOR\t%INFO/FS\t%FILTER\n'
fields of soft-filtered vcf for the report.0eda098
remove WS245 genome folder that is no longer used.
da9f6da