artic-network / fieldbioinformatics

The ARTIC field bioinformatics pipeline
MIT License
110 stars 69 forks source link

issue assigning variants occuring within an amplicon overlap region #106

Open george-githinji opened 2 years ago

george-githinji commented 2 years ago

Running to an issue with variant assignment of omicron datasets with the ARTIC pipeline reverts variants to the reference despite sufficient read support for the variants in the bam file. The variants are occuring within amplicon overlap regions Looking at the vcf report i see the following;

12:27:11] [artic-tools::check_vcf] variant at pos 23599: T->G [12:27:11] [artic-tools::check_vcf] located within an amplicon overlap region [12:27:11] [artic-tools::check_vcf] var pos does not match with that of previously identified overlap var, holding new var (and dropping held var at 23013) [12:27:11] [artic-tools::check_vcf] variant at pos 23604: C->A [12:27:11] [artic-tools::check_vcf] located within an amplicon overlap region [12:27:11] [artic-tools::check_vcf] var pos does not match with that of previously identified overlap var, holding new var (and dropping held var at 23599)

Wondering if this is a known issue and how best to address this.

hsnguyen commented 2 years ago

This issue may introduce wrong ref. allele reversions. Because --strict option is useful to deal with some other bugs, I think it's best to mask these sites (assumed contamination) instead of pad them with ref alleles.

george-githinji commented 2 years ago

Thanks for the comment. Wondering which other bugs would be introduced by negating the --strict mode.

hsnguyen commented 2 years ago

Thanks for the comment. Wondering which other bugs would be introduced by negating the --strict mode.

I don't remember in details but somethings related to the conflicts between failed and passed vcf that makes bcftools failed. Maybe this one, I think artic-tools::check_vcf somehow can help overcome the issue?