artic-network / fieldbioinformatics

The ARTIC field bioinformatics pipeline
MIT License
112 stars 68 forks source link

rebase hetzy mod to 1.2.1 upstream #77

Open macieksk opened 3 years ago

macieksk commented 3 years ago

This is an adjusted rebased version of pull request #58

The original artic_vcf_filter --medaka (used in Artic Nanopore Medaka pipeline) filters out heterozygotic variants completely. This causes omissions of otherwise good mosaic variants present in sequenced virus samples. For example, a proper variant present in only 70% of reads used to be filtered out. This patch adds options for a more precise control of heterozygotic variants filtering with moderately permissive defaults, which should filter out nanopore homopolymer false positives. Old behavior can be enabled with `--hetmf Inf'.

usage: artic_vcf_filter [-h] [--nanopolish] [--medaka]
                        [--no-frameshifts]
                        [--heterozygotic-min-fraction HETMF]
                        [--heterozygotic-min-reads HETMR]
                        inputvcf output_pass_vcf output_fail_vcf

positional arguments:
  inputvcf
  output_pass_vcf
  output_fail_vcf

optional arguments:
  -h, --help            show this help message and exit
  --nanopolish
  --medaka
  --no-frameshifts
  --heterozygotic-min-fraction HETMF, --hetmf HETMF
                        minimal fraction of alternate allele reads for a
                        heterozygotic variant to be accepted (for medaka filter) (default: 0.5)
  --heterozygotic-min-reads HETMR, --hetmr HETMR
                        minimal number of alternate allele reads for a
                        heterozygotic variant to be accepted (for medaka filter) (default: 12)

An example of hetereozygotic variant accepted with the default parameters. MN908947.3 24872 . G T 500.0 PASS DP=400;AC=120,227;AM=53;MC=0;MF=0.0;MB=0.0;AQ=11.48;GM=1;PH=6.02,6.02,6.02,6.02;SC =None; GT:GQ:PS:UG:UQ 0/1:147.24:.:0/1:147.24

An example of filtered out homopolymer false positive. > MN908947.3 10527 . C CT 96.06 PASS DP=398;AC=130,59;AM=209;MC=0;MF=0.0;MB=0.0;AQ=7.4;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ 0/1:96.06:.:0/1:96.06