DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
729 stars 273 forks source link

Questions regarding deciding kmer size and minimizer size in Kraken2 #778

Open humbleflowers opened 12 months ago

humbleflowers commented 12 months ago

Hello,

I am planning to dig deep in my nanopore shotgun data and figure out how i can make better analysis.

Currently, with default parameters the results are not bad as such but the proportion of classification is not accurate (saying this from a truth metagenome sample).

I am considering to find out optimal kmer size but i am thinking about the following.

If i increase kmer size from current 35 to suppose 200 I am wondering if the proportion of classification will be lower or bad?

Saying this since, Kraken2 looks for exact kmer matches and nanopore will have sequencing errors(since i am seeing around q15 on average). I am expecting lot of kmers with more than one error which could be a potential match would be ignored by kraken2.

If thats the case, increasing kmer size in nanopore will compromise the number of classifications due to inherent nature of nanopore quality. Has anyone here had a experience with this? and what way could this be worked out.

Is there a way to allow few mismatches during kmer match expecting atleast one or two mismatch?

Thank you.

jenniferlu717 commented 11 months ago

If the sequencing errors are higher (such as in nanopore), you should be decreasing kmer size, not increasing it.

Kraken relies on exact matching to be fast so it does not allow for mismatches as an alignment program would.

ChillarAnand commented 3 months ago

I initially thought of increasing k-mer size as the read length is large in nanopore.

@jenniferlu717 You are right. k-mer size should be decreased.