artic-network / artic-ncov2019

ARTIC nanopore protocol for nCoV2019 novel coronavirus
Creative Commons Attribution 4.0 International
168 stars 166 forks source link

Artic minion sometimes fails consensus generation step #19

Closed sthifx closed 4 years ago

sthifx commented 4 years ago

In one of the samples we're testing at CGR, no consensus.fasta is generated when running 'artic minion'

The alignment stages run fine, but the issue appears to arise during variant calling by medaka. The error seems to stem from a single variant at position 9514 bp (A>G), covered by nCoV-2019_31_LEFT - nCoV-2019_31_RIGHT (9204-9226 → 9557-9585).

This variant is called by both runs of Medaka, and both are subsequently merged into the single merged.vcf file (though not collapsed by position, so it is present twice in the VCF). This would be fine if both copies of the variant passed filtering, but one passes and one fails (A pretty rare event, I suspect!). Therefore, the failed position is masked in the reference during artic_mask, and then when bcftools consensus is run, the command fails with an error stating that the reference allele in the vcf (A) does not match the reference base (now an N).

I have seen this occur in one other sample too, so it is not an isolated case.

I've attached a small test case illustrating this, if you need to replicate the issue. The run.sh script shows the aritc command to run, and the input test.fastq file is the set of reads mapping 1kb +/- the variant causing this problem.

Apologies if this is not a bug, but a mistake at my end - I'm still getting to grips with the code (which is really great - especially the very recent update which speeded runs up enormously). Thanks in advance for any help, and hope you're keeping well!

Sam

EDIT: Forgot to mention that the run was with V2 primers. I tried running artic with both V2 and V3 primer schemes, and this error occured with both. Also, I'm happy to write a workaround if you think one is needed

nickloman commented 4 years ago

Thank you for the informative report. I think this is an edge case I hadn't considered and probably needs some extra logic in vcf_merge to deal with satisfactorily. I'll get back to you!

Mattstorey commented 4 years ago

I've had the same issue with a couple of samples. 'N' in the pre-consensus reference fasta, but a variant call at the same position in the VCF, causes the final bcftools consensus step to crash. A workaround would be great.

nickloman commented 4 years ago

Thanks folks, now fixed in the master branch.