dahak-metagenomics / dahak

benchmarking and containerization of tools for analysis of complex non-clinical metagenomes.
https://dahak-metagenomics.github.io/dahak
BSD 3-Clause "New" or "Revised" License
21 stars 4 forks source link

Limiting kaiju to genus-level output #69

Closed kternus closed 6 years ago

kternus commented 6 years ago

Expected behavior

Kaiju can generate output that includes taxonomic classifications below the genus level.

Actual behavior and steps to reproduce the behavior

The current kaiju steps in the taxonomic classification workflow will report genus-level results, along with the number of reads assigned to each genus and krona plot visualizations.

Others may have better thoughts on how to do this, but here are examples of kaiju commands that we've run in the past to generate output below the genus level:

Run kaiju: kaiju -v -x -t /data/kaijudb/nodes.dmp -f /data/kaijudb/kaiju_db.fmi -i BMI_bmi_reads.fasta -o BMI_reads.out

Add taxon names: addTaxonNames –t /data/kaijudb/nodes.dmp –n names.dmp –u –p –I BMI_reads.out –o BMI_reads.names.out

Generate summary kaiju report: KaijuReport –t /data/kaijudb/nodes.dmp –n names.dmp –p –I BMI.out –r species -o BMI_reads.out.summary

brooksph commented 6 years ago

Hi @kternus! Thanks for pointing this out. In the example workflow the Generate summary kaiju report specifies genus with -r genus. This can be set to species or maybe even strain and only affects the report, not the actual classification. I will point that out in the tutorial. Also, we will make sure that the the cli allows the user to specify the rank for the final report. I created an issue in the dahak taco repo where the snakefile currently lives https://github.com/dahak-metagenomics/dahak-taco/issues/1

kternus commented 6 years ago

Great, thanks @brooksph! You might also want to consider what information you'll need kaiju to output in the report for future automated scoring/benchmarking. I don't think NCBI taxonomy IDs will be reported without the addTaxonNames command, but it will output the organism name with the -r species command.

charlesreid1 commented 6 years ago

I've updated the -r genus flag to be -r {taxonomic_rank} so the user can specify their own parameter for that, as per dahak-metagenomics/dahak-taco#1.

We don't currently have a rule that calls addTaxonNames so we can work that into the taxonomic_classification workflow rules currently in dahak-taco.

kternus commented 6 years ago

I think we can close this issue if @charlesreid1 is done updating -r {taxonomic_rank} and adding addTaxonNames?

Thanks!

charlesreid1 commented 6 years ago

This almost slipped through. The taxonomic classification Snakefile, particularly the kaiju section, has been updated to include an add_taxon_names rule with commit 4ff52bdb03e9d7013583976732a3b20ad219a150.