bioinformatics-centre / kaiju

Fast taxonomic classification of metagenomic sequencing reads using a protein reference database
http://kaiju.binf.ku.dk
GNU General Public License v3.0
272 stars 66 forks source link

functional analysis with M5nr #64

Closed rizkg closed 6 years ago

rizkg commented 6 years ago

Hi,

First thank you for this great software !

This is not an issue, but a question regarding usage.

I was wondering if it would be possible to use Kaiju to do functional analysis. For example if I build an index on the M5nr database (https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-13-141), and then use KaijuX to report on which sequence each read is assigned, it seems that would work perfectly and much faster than traditional alignment approaches. However, I could not find any reference or anyone using Kaiju for functional analysis. Is there any reason for that, or is it just than no one tried that yet (or maybe I didnt find the proper reference) ?

pmenzel commented 6 years ago

Hi,

yes that is certainly doable and a good use case for kaijux. Note that you will get the list of sequence names that share the best possible match (as in regular kaiju). Probably it is a bit too complicated to make custom database for most people, so it was not used.

Since v1.5, makeDB.sh also writes the Genbank IDs to the kaiju database, so that the output of kaiju now also contains these IDs together with the taxon IDs in the output (using option -v). From that you could also do a functional annotation using a lookup table that maps Genbank ID to a GO ID or similar.

rizkg commented 6 years ago

great, thank you.