Consensus sequences - Githubissues

Hi About your first question, since it can be quite hard to filter common patterns from adapters, we added an option to exclude a list of k-mers from the counting phase. This is not perfect, but will work fine if you need to prevent trimming of a specific sequence. Look for the "forbid_kmer" option of the configuration file.

Now, why would "unknown" adapters appear during ONT sequencing? The answer is quite simple: Oxford Nanopore Technology do not publicly disclose the adapter sequences, or at least not outside of the ONT community from what i have seen.

The only known database for ONT adapters when we published our paper was the original Porechop database (adapters.py) curated by Ryan Wick and other members. This database is no longer maintained since 2018, so any new adapter is basically unknown.

It seems ONT is doing this on purpose, since recent ONT basecallers (guppy, dorado, and others) are supposed to trim the reads during the basecalling. Being based on neural network, those tools trimming step is basically a black box for us. It makes them pretty difficult to trust, and their effectiveness is hard to evaluate without the adapter sequences to compare.

Our study revealed (at least for guppy) that residual (known) adapter sequence can be found in public dataset processed by ONT basecallers. This is why tools such as Porechop_ABI are needed to clean datasets, or at least for quality control.

Disclaimer I have been out of bioinformatics ressearch for 2 years now, and even if I keep reading papers from time to time, you should take my statements with a grain of salt. ONT may have changed it's policy recently (and i may be unaware of this), or maybe their basecallers are perfect now ? Who knows? Not me for sure.

bonsai-team / Porechop_ABI

Consensus sequences #22