artic-network / fieldbioinformatics

The ARTIC field bioinformatics pipeline
MIT License
110 stars 69 forks source link

Artic v5 mismatched primer names in bedfile causes "dropped" amplicons #126

Closed Sam-Sims closed 10 months ago

Sam-Sims commented 10 months ago

Hello!

I believe there might be an issue when using the bed file located on the other repository (https://github.com/artic-network/artic-ncov2019/blob/master/primer_schemes/nCoV-2019/V5.3.2/SARS-CoV-2.scheme.bed) and artic minion (I have created the issue here as it relates to the filtering carried out as part of artic minion)

The bed file in that repository has different names for the following primer pairs: SARS-CoV-2_3 SARS-CoV-2_31 SARS-CoV-2_62 SARS-CoV-2_89 SARS-CoV-2_96

The above pairs have a mismatching suffix: SARS-CoV-2_400_3_LEFT_1 and SARS-CoV-2_400_3_RIGHT_0 SARS-CoV-2_400_31_LEFT_1 and SARS-CoV-2_400_31_RIGHT_0 etc etc

Where as the other pairs are matching, eg: SARS-CoV-2_400_4_LEFT_0 and SARS-CoV-2_400_4_RIGHT_0

As the primer names are mismatching, in the align_trim.py script the reads for this amplicon are flagged as not correctly paired and so are skipped, namely line 200:

correctly_paired = p1[2]['Primer_ID'].replace('_LEFT', '') == p2[2]['Primer_ID'].replace('_RIGHT', '')

When the _LEFT and _RIGHT are removed and the primer names compared - they mismatch.

This can be verified by checking the alignreport.er and alignreport.txt files. When using the bed file from artic-network/artic-ncov2019 you will find all the reads belonging to the above amplicons skipped, and none present in the alignreport.txt file.

However using the bed file from artic-network/primer-schemes - the reads are included like normal.

This gives the appearance of those amplicons being "dropped" as they now have 0 coverage when looking at the primertrimmed.rg.sorted.bam file.

This also has implications for pipelines that pull the primer version from artic-network/artic-ncov2019 and not artic-network/primer-schemes

Thanks

BioWilko commented 10 months ago

Hi Sam

You are correct, that bed file doesn't work with this pipeline. We do however use the version which is working in the primer-schemes repo as should viralrecon and so a PR should be opened there.

Sam