artic-network / fieldbioinformatics

The ARTIC field bioinformatics pipeline
MIT License
110 stars 69 forks source link

Variant in amplicon overlapping region #94

Open hannahgoldswain opened 2 years ago

hannahgoldswain commented 2 years ago

I have encountered a mutation that consistently arises in an amplicon overlap region. However one amplicon has coverage exceeding 300x whereas the other has coverage <20 throughout my dataset. In the vcfreport.txt file the mutation is recorded as ‘located within an amplicon overlap region; nothing seen at position yet, holding var’ and then when the next variant is located at an overlapping region the message explains that the new var is being held and the old var is being dropped. This variant does then not pass checks despite having over 300x coverage and appearing present in the bam file. Is there a way to explain this or mitigate for the low coverage in one amplicon but ample coverage in the other?

I also found that where the variant passes checks in other samples the vcfreport.txt reports ‘multiple copies of var found at pos X in overlap region, keeping all copies’. What is the threshold that needs to met to keep the variant?

Thanks for your help!

shimbalama commented 2 years ago

I think that this issue might be the same as the issue I raised SNP filtering/masking/soft clipping with medaka #92 . Unfortunately I think that the devs here are a bit overwhelmed and have become unresponsive.

ItokawaK commented 2 years ago

This is obviously caused by the below changes in vcf_filter.py in v1.3.0-dev.

https://github.com/artic-network/fieldbioinformatics/commit/2ffd9ae94c7577c5abb64b26c9d212e39a489298#diff-356e807121a2f2a33814ad0994b7880291a0f9fbe3dbf6a3363cb08943e49fdfL88-L94

I am not sure why they stopped ignoring variants with low DP, but assume It probably is related to the deprecation of Longshot.