Medaka calls a 5 nt deletion instead of a 6 nt deletion

Hi there,

in our recent Omicron samples (21K aka BA.1) we found a few with a weird number of spike mutations, in particular only one AA substitution, see sample16 here:

(Linage assignment and mutations by nextclade.)

So I was looking into the data and tracked it down to a 5 nt deletion (21766-21770) in the spike, which causes a frame shift (S:70-1274).

That's why substitutions after S:A67V are missing and we see deletions S:I68- and SH69- instead of S:H69-, S:V70-, which are normally found in 21K aka BA.1 (Omicron). On nucleotide level we see a 6 nt deletion (21765-21770) in 'normal' 21K Omicrons.

In the genome browser all looks fine, also the coverage on position 21765, which should be a deletion: igv_snapshot_run382_del

The ARTIC pipeline is called as follows (within poreCov):

artic minion --medaka --medaka-model r941_min_hac_g507 --min-depth 20 --normalise 500 --threads 16 --scheme-directory external_primer_schemes

With nanopolish instead of medaka we see the expected 6 nt deletion.

Primer protocol is ARTIC V4.1.

Updates:

same behavior for medaka 1.5.0 and medaka 1.4.3 (inside ARTIC)
checked medaka model
solved with sup (super-acc) basecalling and respective medaka model

The question is now: Can we fix that by adapting parameters for medaka?

artic-network / fieldbioinformatics

Medaka calls a 5 nt deletion instead of a 6 nt deletion #100