translated search from unclassified reads only

DerrickWood / kraken2

The second version of the Kraken taxonomic sequence classification system

MIT License

727 stars 273 forks source link

translated search from unclassified reads only #559

Open jorondo1 opened 2 years ago

jorondo1 commented 2 years ago

Hi, Does a translated search run instead of the nt search, or does it complement it? In other words, does Kraken translated search classify reads first with nucleotide alignments and then through a translated search on unclassified reads, or does it only run a translated search on all reads?

If approaches are not complementary, could --unclassified-out from a nucleotide search be fed into a translated search and the classified results "added" to the classified outputs to create a report representing all taxa identified by both nt or translated searches? The HUMANnN pipeline uses that approach (Bowtie2 first, then DIAMOND on unclassified reads) and I'm hoping to achieve something similar with Kraken.

I'm quite new to this so maybe I'm missing pieces, but I hope someone can help.

Cheers,

fconstancias commented 2 years ago

Hi @jorondo1, it seems you are looking for struo2

jorondo1 commented 2 years ago

Hi @fconstancias,

I'm assuming the answer to my first question is no. However, I'm not sure how struo2 addresses what I'm trying to do?

I want to know if it's possible to use Kraken first for a nucleotide search; then feed the unclassified reads in a translated search; and then use both results for the taxonomic classification (and feeding into bracken).

This is basically what HUMANnN does, but we have reasons to believe that given the nature of our samples, Kraken will perform better at taxonomy assignment.

Thanks in advance

jorondo1 commented 2 years ago

I found out that Krakentools can combine kreports together. I think we could simply run kraken once with the nt db, then once with the prot db on all unclassified reads, then combine the outputs.

Rohit-Satyam commented 2 years ago

Okay. So when @fconstancias said you can use Struo2 he meant to use struo2 custom databases which are also available for Kraken2 and Bracken here. I am testing if this reduce unclassified reads in my dataset since my current unclassified reads are nearly 74.77% using PlusPF kraken2 refseq indexes.

ChillarAnand commented 3 months ago

@Rohit-Satyam Were you able to test it out?

hermidalc commented 3 months ago

This works with Kraken2 you just have to set it up yourself. You build two separate DBs from the same library types (e.g. bacteria, viral, etc), one for nucleotide and one for protein, then do a first pass classification of the host filtered reads on the nucleotide database and save unclassified output reads, then do the second pass classification of unclassified reads on the protein database, then finally combine (by summing) the reports using KrakenTools combine_kreports.py. I have this working as part of a larger workflow.