AndersenLab / wi-gatk

The new GATK-based pipeline for wild isolate C. elegans strains
1 stars 3 forks source link

Update to wi-gatk #16

Closed danrlu closed 4 years ago

danrlu commented 4 years ago

Here are the updates:

  1. GATK VariantFiltration on the FORMAT (genotype) field with --genotype-filter-expression "DP < ${params.min_depth}" --genotype-filter-name "DP_min_depth" will only add a FT field with PASS or DP_min_depth for variant positions that has at least 1 strain failed the filter. See here: https://gatk.broadinstitute.org/hc/en-us/articles/360035891191-VariantFiltration-FT-tag In other words, for variant positions that all strains pass the filter, there will not be a PASS added. This creates a challenge for bcftools filter -O u --set-GTs . --exclude 'FORMAT/FT != "PASS"' in the hard filter step. So I added gatk SelectVariants --set-filtered-gt-to-nocall to change genotype in failed positions to ./. in https://gatk.broadinstitute.org/hc/en-us/articles/360035531012--How-to-Filter-on-genotype-using-VariantFiltration Note this step should happen before filtering for high_missing. ded49b8

  2. bcftools softfilter with --mode + will append new filter names to the FILTER field for sites that fail, with --mode x will reset FILTER field to PASS for sites that pass this current filter. http://samtools.github.io/bcftools/bcftools.html#filter Changed --mode +x to --mode + for high_missing and high_heterozogysity dbbdcdc

  3. Removed --max-genotype-count 3000 --max-alternate-alleles 100 from single sample gatk HaplotypeCaller step. May consider adding this to gatk GenotypeGVCFs step but increasing these values will increase computation cost exponentially https://gatk.broadinstitute.org/hc/en-us/articles/360036348452-GenotypeGVCFs#--max-alternate-alleles 9a99b43

  4. Modify outputs associated with making the report:

    • No longer write out annotated.vcf which I added earlier for troubleshooting purpose.
    • Add option -s- to bcftools stats to write out per sample stats
    • Write out the '%QUAL\t%INFO/QD\t%INFO/SOR\t%INFO/FS\t%FILTER\n' fields of soft-filtered vcf for the report. 0eda098
  5. remove WS245 genome folder that is no longer used. da9f6da

danrlu commented 4 years ago

I'll merge pull request to wrap up the intended changes. And put the unresolved suggestions to the project to-do list. Thanks for reviewing and the comments!!