DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system
MIT License
714 stars 271 forks source link

Sub-species classification #223

Closed ml3958 closed 4 years ago

ml3958 commented 4 years ago

Hi Kraken2 developers, thanks for developing this amazing tool.

I am working with a bacterial species with high degree of intra-species genomic variations (SNPs spread across the core genome, and heterogeneous accessory genes). In this case, I think it metagenomic samples using Illumina reads are sufficient to get down to a sub-species/strain level.

I was wondering would kraken2 support that? More specifically:

  1. For database, I plan to add my ~300 subspecies genomes and rebuild the kraken2 database. Do I need to do anything to specify the taxonomy information of the ~300 subspecies, for example using a phylogenetic tree?
  2. For the classification, would it be possible to get subspecies estimation (I see the default taxonomy rank only get down to [S]pecies)?

Any advises would be highly appreciated!

jenniferlu717 commented 4 years ago

The default taxonomy rank can go below the species level. By default, Kraken2 uses the NCBI taxonomy which contains below-species ranks, which in kraken 2 will show up as "S1" "S2" etc.

It will depend on whether the subspecies you add to the database are linked to a taxonomy ID in NCBI. If you have a mapping of the sequence IDs to the taxonomy IDs, appending those to any seqid2taxid.map file before finishing the rest of the build will give you what you want.

ml3958 commented 4 years ago

@jenniferlu717 thanks for the reply! Unfortunately, the subspecies I add to the database are not linked to unique NCBI taxonomy ID.

Is there a easy way to customize them onto the taxonomy map? I am adding ~300 genomes, and I already know their phylogenetic relationship. Or, for this purpose, krankenUniq is a better tool?