Closed TimSkvortsov closed 6 years ago
Hi @TimofeySkvortsov , I think it is a good idea. Can you provide some information on the report format of Kaiju?
The output format is described in the Kaiju's manual: https://github.com/bioinformatics-centre/kaiju#output-format
There are two report types that could be generated by Kaiju, default and verbose, both are somewhat similar to the Kraken's output format.
Kaiju's default report has three columns separated by tabs:
C D00420:130:H2TWLBCXY:1:1101:8346:2236 1333523
C D00420:130:H2TWLBCXY:1:1101:11269:2189 756883
C D00420:130:H2TWLBCXY:1:1101:14118:2184 186196
C D00420:130:H2TWLBCXY:1:1101:14463:2150 86177
U D00420:130:H2TWLBCXY:1:1101:14318:2153 0
C D00420:130:H2TWLBCXY:1:1101:14663:2169 2157
C D00420:130:H2TWLBCXY:1:1101:16736:2123 131567
C D00420:130:H2TWLBCXY:1:1101:18392:2654 86177
U D00420:130:H2TWLBCXY:1:1101:20170:2563 0
Kaiju's verbose report has seven columns separated by tabs, the first three are the same as in the default report:
C D00420:130:H2TWLBCXY:1:1101:1400:2243 1644061 11 44470,222984,253107,406552,1227497, AFO59089.1,ELY59679.1,WP_007107841.1,WP_007258503.1,WP_083909286.1, HSDDFSRRTYE,
U D00420:130:H2TWLBCXY:1:1101:8960:2181 0
C D00420:130:H2TWLBCXY:1:1101:11269:2189 756883 14 756883, WP_014051658.1, VRFGTESGVRADMQ,VRFGTESGVRADMQ,
C D00420:130:H2TWLBCXY:1:1101:14065:2135 2237 44 2237, WP_004958731.1,WP_008307830.1, MIELLYAISTLVFVVAGLTMVGMAMRAYVQTSRQAMLHLSVGFS,
U D00420:130:H2TWLBCXY:1:1101:14318:2153 0
U D00420:130:H2TWLBCXY:1:1101:14296:2174 0
C D00420:130:H2TWLBCXY:1:1101:17483:2230 1744 11 1744, WP_055345270.1, LRSGRTARRPR,LRSGRTARRPR,
U D00420:130:H2TWLBCXY:1:1101:18119:2213 0
U D00420:130:H2TWLBCXY:1:1101:19659:2188 0
C D00420:130:H2TWLBCXY:1:1101:20071:2103 1194090 31 1194090, WP_073062635.1, GIPPLAGFFSKDEILAFTFNAGFGEFAGSLY,GIPPLAGFFSKDEILAFTFNAGFGEFAGSLY,
The columns are:
Hope it helps.
How does the kaiju classification summary generated by kaijuReport look like? Currently Pavian needs the taxonomy information to be in the result file.
Esp with the option -p to print the full taxon path instead of just the taxon name.
It looks something like this:
% reads phylum
-------------------------------------------
49.669548 2600324 cellular organisms; Archaea; Euryarchaeota;
19.288523 1009802 cellular organisms; Bacteria; Proteobacteria;
5.381846 281753 cellular organisms; Bacteria; Terrabacteria group; Actinobacteria;
1.475155 77228 cellular organisms; Bacteria; FCB group; Bacteroidetes/Chlorobi group; Bacteroidetes;
1.220038 63872 cellular organisms; Archaea; DPANN group; Candidatus Nanohaloarchaeota;
1.154883 60461 cellular organisms; Bacteria; Balneolaeota;
1.129727 59144 cellular organisms; Eukaryota; Opisthokonta; Fungi; Dikarya; Ascomycota;
1.114159 58329 cellular organisms; Bacteria; Terrabacteria group; Firmicutes;
0.705105 36914 cellular organisms; Eukaryota; Opisthokonta; Fungi; Dikarya; Basidiomycota;
0.264190 13831 cellular organisms; Eukaryota; Alveolata; Apicomplexa;
0.217678 11396 cellular organisms; Bacteria; Terrabacteria group; Cyanobacteria/Melainabacteria group; Cyanobacteria;
0.160814 8419 cellular organisms; Eukaryota; Viridiplantae; Chlorophyta;
0.126947 6646 cellular organisms; Bacteria; Terrabacteria group; Chloroflexi;
0.124980 6543 cellular organisms; Bacteria; PVC group; Planctomycetes;
0.112144 5871 cellular organisms; Bacteria; Acidobacteria;
######### here I removed several rows #########
0.000019 1 cellular organisms; Eukaryota; Stramenopiles; PX clade; Xanthophyceae;
0.000019 1 cellular organisms; Bacteria; unclassified Bacteria; Bacteria candidate phyla; Patescibacteria group; Parcubacteria group; Candidatus Jacksonbacteria;
-------------------------------------------
0.551168 28855 Viruses
16.250864 850773 cannot be assigned to a phylum
-------------------------------------------
31.413546 2397816 unclassified
Thanks for the info, I'll add it in the next version of pavian
Great, thank you very much!
Hi @TimofeySkvortsov , sorry for the late response. I did not have success so far importing the report file itself, but I think the output file can be easily converted into a Kraken-style report. Can you try kraken-report
on the kaiju output file, with the --db
argument pointing to the parent directory of the NCBI taxonomy dump?
I tried this and it works on the raw output from Kaiju.
Thanks for the testing, @devindrown ! I'll add a section to the README
Just to clarify: When you say
with the
--db
argument pointing to the parent directory of the NCBI taxonomy dump
Do you mean the kraken NCBI taxonomy dump?
Because when I point it at the kaiju database directory I get the error
kraken-report: database ("kaijudb/") does not contain necessary file database.kdb
Just to clarify: When you say
with the
--db
argument pointing to the parent directory of the NCBI taxonomy dumpDo you mean the kraken NCBI taxonomy dump?
Because when I point it at the kaiju database directory I get the error
kraken-report: database ("kaijudb/") does not contain necessary file database.kdb
I have the same problem.
Just to clarify: When you say
with the
--db
argument pointing to the parent directory of the NCBI taxonomy dumpDo you mean the kraken NCBI taxonomy dump? Because when I point it at the kaiju database directory I get the error
kraken-report: database ("kaijudb/") does not contain necessary file database.kdb
I have the same problem.
Excuse me, have you solved this problem? I met the same problem.
Just to clarify: When you say
with the
--db
argument pointing to the parent directory of the NCBI taxonomy dumpDo you mean the kraken NCBI taxonomy dump? Because when I point it at the kaiju database directory I get the error
kraken-report: database ("kaijudb/") does not contain necessary file database.kdb
I have the same problem.
Excuse me, have you solved this problem? I met the same problem.
I point it to a database that I used with Krakenuniq and it works.
I used:
kraken-report --db path/to/db path/to/kaiju.out > path/to/kaiju.tsv
ls to the KrakenUniq Database:
database.kdb
database.idx
taxonomy/nodes.dmp
taxonomy/names.dmp
Hi,
I was wondering if it would be possible to add support for reports produced by Kaiju. Pavian is an amazing piece of visualisation software, thank you very much for coding it.