artic-network / fieldbioinformatics

The ARTIC field bioinformatics pipeline
MIT License
110 stars 69 forks source link

artic minion fails with midnight primers #102

Closed rzlim08 closed 1 year ago

rzlim08 commented 2 years ago

Hello,

I've run into a bug where the artic minion pipeline fails at artic_make_depth_mask in some samples which have reads mapping to near the end of the reference genome.

I did some tracing, and it looks like there is an off-by-one error caused by align_trim creating a bam file which goes beyond the end of the reference genome. So far, I've only seen this in midnight primers, but am unsure if that's the cause of this issue.

To replicate this issue, I've uploaded a sample read, which can be run with:

artic minion --medaka --no-longshot --normalise 1000 --threads 4 --scheme-directory primer_schemes --read-file single_broken.fastq.gz --medaka-model r941_min_high_g360 nCoV-2019/V1200 single_broken single_broken.fastq.gz

Midnight primer files can be found: here artic version = 1.2.1 Ubuntu 20.04

BioWilko commented 2 years ago

Align_trim isn't designed to work with midnight since the reads are fragmented, we recommend you use one of the other pipelines designed with midnight in mind.

However I'm not sure I can replicate the issue you're describing here, I used your testdata and generated a BAM where the poly-A read is mapped to pos 29837 but it did not cause the pipeline to crash. Also align_trim hasn't made the BAM longer since the BAM file output by minimap2 single_broken.sorted.bam looks identical.

In either case midnight is not supported by this pipeline presently, others have adapted it to work with that primer scheme however.