KrakenUniq reports into phyloseq

fbreitwieser / krakenuniq

🐙 KrakenUniq: Metagenomics classifier with unique k-mer counting for more specific results

GNU General Public License v3.0

224 stars 43 forks source link

KrakenUniq reports into phyloseq #101

Open DanaiP opened 2 years ago

DanaiP commented 2 years ago

Hello, I am new to KrakenUniq and I’d like to ask you a question. I have obtained the KrakenUniq reports, and I want to import them in phyloseq. Do you authors suggest a best practice? Working with Kraken2 is easier since one can create a .biom file with kraken-biom (https://github.com/smdabdoub/kraken-biom), that can be subsequently opened with phyloseq. I am aware of Pavian, but its exported tsv tables need to be parsed anyway in order to import them in phyloseq. Thank you in advance. Danai

salzberg commented 2 years ago

The KrakenUniq format is identical to Kraken1, except that we add one extra column to include the number of unique k-mers found (in the input reads) for each genome, species, genus, etc. We didn't develop those file-conversion tools, but it should be very easy to adapt them to KrakenUniq, simply by handling the extra column. You mentioned Kraken2, and it has an option to include something very similar to the 'unique k-mers' of KrakenUniq, but it uses 'unique minimizers' instead (not quite the same thing). I'm not sure but I think those appear in the same column as the unique k-mers from the KrakenUniq report.

DanaiP commented 2 years ago

Thank you very much for the answer Danai

emilyvansyoc commented 1 year ago

Hello - I'm running into a similar problem. Unfortunately, removing the extra column in the krakenUniq report is not sufficient to make kraken-biom work on krakenUniq reports, and my guess is that this is due to the different type of taxonomic classification. In kraken2, the taxonomic ranks are given as "R", "P", "C", etc., which results in a BIOM table with taxonomy similar to Greengenes. The KrakenUniq report gives ranks as a full name in lowercase "clade", "kingdom", etc., and many more taxonomic levels than P,O,C,F,G,S.

Do you have any ideas to either convert the KrakenUniq output to a BIOM table, or create a text table collapsed at the lowest classified taxonomic level? Either of these would increase the flexibility to do downstream microbiome analyses with R and other tools beyond Pavian and Krona.

Thanks!