bxlab / metaWRAP

MetaWRAP - a flexible pipeline for genome-resolved metagenomic data analysis
MIT License
383 stars 188 forks source link

krona/kraken summaries? #120

Open raw937 opened 5 years ago

raw937 commented 5 years ago

Hello,

Can someone explain this format or have a tool to summarize the outputs? Something that can take the file and transform by ranking: 121001 root bacteria 100010 root Proteobacteria

I assume the number on the end is a count? more UF_megahit_5k.krona

**1**   root    cellular organisms  Bacteria    Terrabacteria group Actinobacteria  Actinobacteria  Corynebacteriales   Corynebacteriaceae  Corynebacterium Corynebacterium frankenforst
ense    Corynebacterium frankenforstense DSM 45800
**4**   root    cellular organisms  Bacteria    Proteobacteria  Alphaproteobacteria Sphingomonadales    Sphingomonadaceae   Sphingomonas    Sphingomonas sp. LK11
**4**   root    cellular organisms  Bacteria    Proteobacteria  Gammaproteobacteria Pseudomonadales Pseudomonadaceae    Pseudomonas Pseudomonas aeruginosa group    Pseudomonas oleovora
ns/pseudoalcaligenes group  Pseudomonas furukawaii
ursky commented 5 years ago

If you ran the module on reads, then the number at the start represents the total number of reads with that taxonomy. I am not sure what your application is so I cannot advise what you want to do with it. This format is designed for KronaTools (I personally use it a lot), but can be changed into whatever format you want.

raw937 commented 5 years ago

Its contigs not reads? I took a look at krona tools. Have any thoughts for what I am asking?

ursky commented 5 years ago

For assemblies the number becomes the sum of the weights of contigs that have that taxonomy. Weight in this case is the contig's length multiplied by its read coverage.

And I am not sure what you are asking.