This is an adjusted rebased version of pull request #58
The original artic_vcf_filter --medaka (used in Artic Nanopore Medaka pipeline) filters out heterozygotic variants completely. This causes omissions of otherwise good mosaic variants present in sequenced virus samples. For example, a proper variant present in only 70% of reads used to be filtered out. This patch adds options for a more precise control of heterozygotic variants filtering with moderately permissive defaults, which should filter out nanopore homopolymer false positives.
Old behavior can be enabled with `--hetmf Inf'.
usage: artic_vcf_filter [-h] [--nanopolish] [--medaka]
[--no-frameshifts]
[--heterozygotic-min-fraction HETMF]
[--heterozygotic-min-reads HETMR]
inputvcf output_pass_vcf output_fail_vcf
positional arguments:
inputvcf
output_pass_vcf
output_fail_vcf
optional arguments:
-h, --help show this help message and exit
--nanopolish
--medaka
--no-frameshifts
--heterozygotic-min-fraction HETMF, --hetmf HETMF
minimal fraction of alternate allele reads for a
heterozygotic variant to be accepted (for medaka filter) (default: 0.5)
--heterozygotic-min-reads HETMR, --hetmr HETMR
minimal number of alternate allele reads for a
heterozygotic variant to be accepted (for medaka filter) (default: 12)
An example of hetereozygotic variant accepted with the default parameters.
MN908947.3 24872 . G T 500.0 PASS DP=400;AC=120,227;AM=53;MC=0;MF=0.0;MB=0.0;AQ=11.48;GM=1;PH=6.02,6.02,6.02,6.02;SC =None; GT:GQ:PS:UG:UQ 0/1:147.24:.:0/1:147.24
An example of filtered out homopolymer false positive.
> MN908947.3 10527 . C CT 96.06 PASS DP=398;AC=130,59;AM=209;MC=0;MF=0.0;MB=0.0;AQ=7.4;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ 0/1:96.06:.:0/1:96.06
This is an adjusted rebased version of pull request #58
The original
artic_vcf_filter --medaka
(used in Artic Nanopore Medaka pipeline) filters out heterozygotic variants completely. This causes omissions of otherwise good mosaic variants present in sequenced virus samples. For example, a proper variant present in only 70% of reads used to be filtered out. This patch adds options for a more precise control of heterozygotic variants filtering with moderately permissive defaults, which should filter out nanopore homopolymer false positives. Old behavior can be enabled with `--hetmf Inf'.An example of hetereozygotic variant accepted with the default parameters.
MN908947.3 24872 . G T 500.0 PASS DP=400;AC=120,227;AM=53;MC=0;MF=0.0;MB=0.0;AQ=11.48;GM=1;PH=6.02,6.02,6.02,6.02;SC =None; GT:GQ:PS:UG:UQ 0/1:147.24:.:0/1:147.24
An example of filtered out homopolymer false positive.
> MN908947.3 10527 . C CT 96.06 PASS DP=398;AC=130,59;AM=209;MC=0;MF=0.0;MB=0.0;AQ=7.4;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ 0/1:96.06:.:0/1:96.06