Ivarz / Conifer

Calculate confidence scores from Kraken2 output
BSD 2-Clause "Simplified" License
21 stars 7 forks source link

Report the taxid and or name #2

Closed Midnighter closed 4 years ago

Midnighter commented 4 years ago

Hello again,

I've been using Conifer for a bit now and I find it very useful. Thank you for that. At the moment, Conifer in its simplest form reports

kraken output read1 confidence read2 confidence average

Since Conifer can obviously do this, as seen for the summary report, I would love to get the output as

taxid name (optional) read1 confidence read2 confidence average

and simply have additional rows for the same taxid. Does this make sense? Would you consider adding this output option? Or maybe there is a different simple way to map the kraken output to the taxid that I am missing right now.

Ivarz commented 4 years ago

Hello, kraken2 output contains taxids. So, in bash you can do something like

./conifer -i example.out.txt -d taxo.k2d \
    | cut -d$'\t' -f3,6,7,8 \
    | sort -k1,1n > output.txt

or some sed magic if you have names in kraken output:

conifer -i example.out.txt -d taxo.k2d \
    | cut -d$'\t' -f3,6,7,8 \
    | sed -E 's/(^.*)\s\(taxid ([0-9]*)\)/\2\t\1/' \
    | sort -k1,1n > output.txt

I hope this helps.

Midnighter commented 4 years ago

Okay, thanks for getting back to me. Yes, this information is contained. I was feeling a bit lazy and I thought since your summary mode already cleans up the name/ID, you must have helper functions to do this already and it would be a straight forward addition.

Ivarz commented 4 years ago

Yeah, it does, however there are multiple fields that one might want to include/exclude. In my opinion, the best way would be to implement something like blastn's outfmt option where user can specify a list of fields to be included in the output. I might add it in future, but for now the fastest way to get the output you need would be piping through bash utilities.