artic-network / fieldbioinformatics

The ARTIC field bioinformatics pipeline
MIT License
112 stars 68 forks source link

Medaka calls a 5 nt deletion instead of a 6 nt deletion #100

Open MarieLataretu opened 2 years ago

MarieLataretu commented 2 years ago

Hi there,

in our recent Omicron samples (21K aka BA.1) we found a few with a weird number of spike mutations, in particular only one AA substitution, see sample16 here:

report_snippet

(Linage assignment and mutations by nextclade.)

So I was looking into the data and tracked it down to a 5 nt deletion (21766-21770) in the spike, which causes a frame shift (S:70-1274).

That's why substitutions after S:A67V are missing and we see deletions S:I68- and SH69- instead of S:H69-, S:V70-, which are normally found in 21K aka BA.1 (Omicron). On nucleotide level we see a 6 nt deletion (21765-21770) in 'normal' 21K Omicrons.

In the genome browser all looks fine, also the coverage on position 21765, which should be a deletion: igv_snapshot_run382_del

The ARTIC pipeline is called as follows (within poreCov):

artic minion --medaka --medaka-model r941_min_hac_g507 --min-depth 20 --normalise 500 --threads 16 --scheme-directory external_primer_schemes

With nanopolish instead of medaka we see the expected 6 nt deletion.

Primer protocol is ARTIC V4.1.


Updates:


The question is now: Can we fix that by adapting parameters for medaka?

BioWilko commented 2 years ago

Hi Marie

Have you posted in the medaka GitHub about this? We support medaka within the pipeline but do not have any control over its behaviour.

Best

Sam W