Closed macieksk closed 3 years ago
Thanks for this PR. It looks interesting but we will need to give it a thorough review and test, so please bear with us.
@macieksk can you please simplify this Pull Request? It is full of unnecessary changes, such as code formatting the whole repository.
I am interested in this change and would be glad to contribute with the review.
@maricatovictor Hi, I'm not sure what happened to this pull request, it used to be relatively simple, at least that's what I remember. Is it because of recent pipeline updates? Anyway, I'll take a look into this in few hours, possibly create a new simple pull request. Thanks.
@maricatovictor The most recent pull request with this mod is #77 . I was only able to test if it runs without errors. The modified filter is implemented for --medaka option only.
The original
artic_vcf_filter --longshot
(used in Artic Nanopore Medaka pipeline) filters out heterozygotic variants completely. This causes omissions of otherwise good mosaic variants present in sequenced virus samples. For example, a proper variant present in only 70% of reads used to be filtered out. This patch adds options for a more precise control of heterozygotic variants filtering with moderately permissive defaults, which should filter out nanopore homopolymer false positives. Old behavior can be enabled with `--hetmf Inf'.An example of hetereozygotic variant accepted with the default parameters.
MN908947.3 24872 . G T 500.0 PASS DP=400;AC=120,227;AM=53;MC=0;MF=0.0;MB=0.0;AQ=11.48;GM=1;PH=6.02,6.02,6.02,6.02;SC =None; GT:GQ:PS:UG:UQ 0/1:147.24:.:0/1:147.24
An example of filtered out homopolymer false positive.
> MN908947.3 10527 . C CT 96.06 PASS DP=398;AC=130,59;AM=209;MC=0;MF=0.0;MB=0.0;AQ=7.4;GM=1;PH=6.02,6.02,6.02,6.02;SC=None; GT:GQ:PS:UG:UQ 0/1:96.06:.:0/1:96.06