The bed file in that repository has different names for the following primer pairs:
SARS-CoV-2_3SARS-CoV-2_31SARS-CoV-2_62SARS-CoV-2_89SARS-CoV-2_96
The above pairs have a mismatching suffix:
SARS-CoV-2_400_3_LEFT_1 and SARS-CoV-2_400_3_RIGHT_0SARS-CoV-2_400_31_LEFT_1 and SARS-CoV-2_400_31_RIGHT_0
etc etc
Where as the other pairs are matching, eg:
SARS-CoV-2_400_4_LEFT_0 and SARS-CoV-2_400_4_RIGHT_0
As the primer names are mismatching, in the align_trim.py script the reads for this amplicon are flagged as not correctly paired and so are skipped, namely line 200:
When the _LEFT and _RIGHT are removed and the primer names compared - they mismatch.
This can be verified by checking the alignreport.er and alignreport.txt files. When using the bed file from artic-network/artic-ncov2019 you will find all the reads belonging to the above amplicons skipped, and none present in the alignreport.txt file.
However using the bed file from artic-network/primer-schemes - the reads are included like normal.
This gives the appearance of those amplicons being "dropped" as they now have 0 coverage when looking at the primertrimmed.rg.sorted.bam file.
This also has implications for pipelines that pull the primer version from artic-network/artic-ncov2019 and not artic-network/primer-schemes
You are correct, that bed file doesn't work with this pipeline. We do however use the version which is working in the primer-schemes repo as should viralrecon and so a PR should be opened there.
Hello!
I believe there might be an issue when using the bed file located on the other repository (https://github.com/artic-network/artic-ncov2019/blob/master/primer_schemes/nCoV-2019/V5.3.2/SARS-CoV-2.scheme.bed) and
artic minion
(I have created the issue here as it relates to the filtering carried out as part ofartic minion
)The bed file in that repository has different names for the following primer pairs:
SARS-CoV-2_3
SARS-CoV-2_31
SARS-CoV-2_62
SARS-CoV-2_89
SARS-CoV-2_96
The above pairs have a mismatching suffix:
SARS-CoV-2_400_3_LEFT_1
andSARS-CoV-2_400_3_RIGHT_0
SARS-CoV-2_400_31_LEFT_1
andSARS-CoV-2_400_31_RIGHT_0
etc etcWhere as the other pairs are matching, eg:
SARS-CoV-2_400_4_LEFT_0
andSARS-CoV-2_400_4_RIGHT_0
As the primer names are mismatching, in the
align_trim.py
script the reads for this amplicon are flagged as not correctly paired and so are skipped, namely line 200:correctly_paired = p1[2]['Primer_ID'].replace('_LEFT', '') == p2[2]['Primer_ID'].replace('_RIGHT', '')
When the _LEFT and _RIGHT are removed and the primer names compared - they mismatch.
This can be verified by checking the
alignreport.er
andalignreport.txt
files. When using the bed file fromartic-network/artic-ncov2019
you will find all the reads belonging to the above amplicons skipped, and none present in thealignreport.txt
file.However using the bed file from
artic-network/primer-schemes
- the reads are included like normal.This gives the appearance of those amplicons being "dropped" as they now have 0 coverage when looking at the
primertrimmed.rg.sorted.bam
file.This also has implications for pipelines that pull the primer version from
artic-network/artic-ncov2019
and notartic-network/primer-schemes
Thanks