borenstein-lab / burrito

A visualization tool for exploratory data analysis of metagenomic data
https://elbo-spice.gs.washington.edu/shiny/burrito/
GNU General Public License v3.0
36 stars 9 forks source link

OTU not found on PICRUSt normalization table #5

Closed najouamghazli closed 4 years ago

najouamghazli commented 4 years ago

Hi,

I am trying to use BURRITO on my PICRUSt2 output. I used SILVA for the affiliation step and FROGS for the whole metabarcoding analysis. When I tried to upload the files to BURRITO I got the following error message:

image

Here's the files that I fed to BURRITO for the analysis:

Tax_OTUs.txt:

OTU 35 36 37 38
Cluster_1 2662 1758 29988 39457
Cluster_2 54163 59711 32443 41438
Cluster_3 22086 16124 23355 20319

Tax_Hierarchy.txt: ('Cause I used Silva for the affiliation)

OTU Kingdom Phylum Class Order Family Genus Species
Cluster_1 Bacteria Gemmatimonadetes Longimicrobia Longimicrobiales Longimicrobiaceae unknown genus unknown species
Cluster_2 Bacteria Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Massilia Multi-affiliation
Cluster_3 Bacteria Proteobacteria Alphaproteobacteria Sphingomonadales Sphingomonadaceae Multi-affiliation Multi-affiliation

metagenome_KO.tsv: this file was generated by PICRUST2

K00 35 36 37 38
K00001 239437.5 240616.79999999993 223849.77000000002 197839.91
K00003 459212.4200000001 435440.65 421968.51999999996 390150.91999999987
K00004 15820.439999999999 14787.97 17458.36 15289.239999999998

I'll appreciate your help.

Best,

Najoua

engal commented 4 years ago

Hi Najoua,

Glad you're interested in using BURRITO!

BURRITO's automatic attribution calculation is based on the original PICRUSt, which only included a genomic content table for Greengenes OTU IDs. Since you are using SILVA, we do not have pre-computed genomic content tables for the taxonomic IDs in your tables.

However, since you already seem to be running PICRUSt2, you could instead use the "--strat_out" flag to PICRUSt2's "metagenome_pipeline.py" script, which should produce a table you can use for BURRITO's pre-calculated function attribution table upload option (third option in the "Taxonomy-function linking method" section of the upload page).

Quick note: Since pre-calculated function attribution tables can be very large, I would recommend trying to run BURRITO with a subset of the table (say ~1,000-10,000 rows of the original) just to make sure that there are no potential formatting issues with your input data before trying to upload the full function attribution table.

Hope that helps!

najouamghazli commented 4 years ago

HI,

Thank you for your reply.

I succedded to run BURRITO on my data (just for the 1st 1,000 rows as suggested, I'm trying now to run it for the whole table).

PS: As I run "picrust2_pipeline.py" command I used the "--stratified" flag.

Now I would like to know, if there is any way to remove the pathways that we're not interested in through BURRITO interface ? For example, I want to remove all the "Human Diseases" Pathway !

Thanks a lot for your help.

Regards,

Najoua

engal commented 4 years ago

Hi Najoua,

Glad to hear you were able to run BURRITO with your data!

Regarding your question, BURRITO does not currently allow you to remove functions from the visualization because this would cause the function abundance bar plot to show relative abundances for a subset of the functional profile. Such a visualization could be misleading without proper indication of how much of the functional profile is being omitted.

However, if you don't want to use the default hierarchy from KEGG, which does link certain KOs to non-prokaryotic pathways, you could try using a custom function hierarchy table that omits unwanted pathways. Note that this will cause KO abundances to be fractionally assigned to the remaining pathways they belong to. This means that the function abundance bar plot will correctly reflect all of the KO abundances in your initial data.

Based on your question, I'm also going to look into creating another available function hierarchy option that would use the BRITE function hierarchy with non-prokaryotic pathways removed. However, I can't guarantee when such an option might be ready for you to use.

Hope that helps!

najouamghazli commented 4 years ago

I would like to thank you so much for your precious help.

BURRITO is still running on my "whole" data since last friday, I think it'll take a little bit more time 'cause it's still on the 2nd step.

I will try using a custom function hierarchy table as suggested while waiting for the results. Thank you.

Best,

Najoua

engal commented 4 years ago

I'll close this issue and answer your question regarding running time in the other thread.