borenstein-lab / burrito

A visualization tool for exploratory data analysis of metagenomic data
https://elbo-spice.gs.washington.edu/shiny/burrito/
GNU General Public License v3.0
36 stars 9 forks source link

Burrito and piCRUST2: unable to upload data #7

Open MelanieF1 opened 4 years ago

MelanieF1 commented 4 years ago

Hello,

Maybe I missed something while using Burrito. I discovered it today and it seems promising : I used piCRUST2 on my 16S data and tried to upload it on BURRITO.

I obtained EC numbers, have their correspondance with a function but at step 3 (validating data), burrito tells me that "the following EC numbers are in the contribution table but are not present in the functional hierarchy and will be automatically removed". All EC seem to be cited after that and the connection with the server is lost.

More details here: In the functional data part, I upload this kind of file for the "Metagenome-based function abundances" -> "Debug_Metagenome-basedfunctionabundances.txt".Debug_Metagenome-basedfunctionabundances.txt

I then upload a "custom genetic content table" for Taxonomy-function linking method that I built in R from piCRUST2 output (which is not in the right format): Debug_File2.txt

Finally I upload a "custom function hierarchy" also obtained from piCRUST2: Debug_File3.txt

For each of the three files, the EC numbers look identical (I of course did'nt provide the complete list which is quite long)

How can I fix this problem ?

Thank you very much in advance

Melanie

engal commented 4 years ago

Hi Melanie,

Glad to hear you're interested in BURRITO! Hopefully I can help you get everything working.

Though you didn't mention one in your original message, I tried generating a file of random OTU abundances for the OTUs listed in the snippet of your custom genomic content table (Debug_File2.txt) and was able to run BURRITO successfully using this test OTU abundance file and the debug files you sent. This means that I'm not entirely sure what is causing the issue that you are encountering, but hopefully we can narrow it down.

First, can you retry uploading your data, but using the "Custom function IDs with no hierarchy" option instead of the "Custom function hierarchy" option and without uploading a file for the "Metagenome-based function abundances" option?

Second, can you do that again, but this time using the "Custom function hierarchy" option with your custom hierarchy (again without uploading a file for the "Metagenome-based function abundances" option)?

Please try both of those and let me know what happens in each case.

restivve commented 4 years ago

Hi!

I am also having a problem uploading my PICRUSt 2 pathways.

I uploaded this file for taxa abundances: taxa_KKfish_silva132.txt

I uploaded this file for metagenome-based function abundances: ec-feature-table.biom.txt

And I get this error:

The following OTU_IDs are in the OTU table but are not present in the PICRUSt 16S normalization table: CCTGTTCGCTCCCCACGCTTTCGAGCCTCAGCGTCAGTTACAGTCCAGAGAGTCGCCTTCGCCACTGGTGTTCTTCCTAATCTCTACGCATTTCACCGCTACACTAGGAATTCCACTCTCCTCTCCTGCACTCTAGATATCCAGTTTGGAATGCAGCACATGAGTTGAGCTCATGTATTTCACATCCCACTTAAATATCCGCCTACGCTCCCTTTACGCCCAATAATTCCGGATAACGCTTGCCACCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGTGGCTTCCTCCTTAGGTACTGTCATTATCATCCCTAAAGACAGAGCTTTACGATCCGAAGACCTTCATCGCTCACGCGGCGTTGCTGCATCAGGGTTTCCCCCATTGTGCAATATTCCCCA CCTGTTTGCTACCCACGCTTTCGCATCTCAGCGTCAATCTCTGTCCAGCAAGCTGCCTTCGCCATTGGTGTTCCTCCATATATCTACGCATTCCACCGCTACACATGGAATTCCACTTGCCCCTCCAGTATTCTAGTTTATCAGTTTTCAATGCAATTTAGTGGTTGAGCCACTAGCTTTCACACCAAACTTAATAAACCGCCTACATGCTCTTTACGCCCAATAATTCCGGATAACGCTCGGGACCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGTCCCTTTCTGGTAGGATAGCGTCAGCTTGAAGCCATTTCCTACTCCAAGTGTTCTTCCCCTACAACAGAGCTTTACGATCCGAAAACCTTCATCACTCACGCGGCATTGCTCCGTCAGACTTTCGTCCATTGCGAAAAATTCCCTA CCTGTTTGCTCCCCACGCTTTCGCACCTCAGCGTCAGTATCGAGCCAGTGAGCCGCCTTCGCCACTGGTGTTCCTCCGAATATCTACGAATTTCACCTCTACACTCGGAATTCCACTCACCTCTCTCGACCTCAAGACCAGGAGTTTCAAAGGCAGTTCCAAGGTTGAGCCCTGGGATTTCACCTCTGACTTTCCGGTCCGCCTACGTGCGCTTTACGCCCAGTAATTCCGAACAACGCTAGCCCCCTCCGTATTACCGCGGCTGCTGGCACGGAGTTAGCCGGGGCTTCTTCTTGGGGTACAGTCATTATCTTCCCCCACGAAAGAGCTTTACAACCCTAAGGCCTTCATCGCTCACGCGGCATGGCTAGATCAGGGTTTCCCCCATTGTCTAAGATTCCCCA CCTATTTGCTACCCACGCTTTCGTGCCTCAGCGTCAGTTACAGACCAGAGAGTCGCCTTCGCCACTGGTGTTCCTCCATATATCTACGCATTTCACCGCTACACATGGAATTCCACTCTCCTCTTCTGCACTCTAGTTATACAGTTACCATGGCATTATGGGGTTGAGCCCCATTCTTTAACCACAATCTTTTATAACCGCCTGCGCACCCTTTACGCCCAATAATTCCGGATAACGCTCGCCACCTATGTATTACCGCGGCTGCTGGCACATAGTTAGCCGTGGCTTTCTGATAAGGTATTGTCAAATACCAAGCATTTCCTCTTGATACCTTTCCTCCCTTATAACAGAGATTTACAACCCGAAGGCCTTCTTCTCTCACGCGGCATTGCTCCATCAGGGTTGCCCCCATTGTGGAAAATTCCCTA CCCATTTGCTACCCTAGCTTTCGTCTCTGAGTGTTAGTAATAGCCCAGTAAAGTGCCTTCGCCATCGGTGTTCTTTCCAATATCTACGCATTTCACCGCTCCACTGGAAATTCCCTTTACCCCTACTATACTCTAGTCTGATAGTTTCGACTGCTGATTTGAAGTTGAGCCTCAAGATTTAACAGTTGACTTAACAAACCACCTACAGACGCTTTACGCCCAGTGATT

engal commented 4 years ago

Hi,

I think you may have sent the wrong file for your taxonomic abundances. The file you sent appears to be a taxonomic hierarchy (i.e. defining taxonomic classifications for your sequences).

However, given the taxonomic hierarchy you sent, it appears that you are not using Greengenes IDs. Currently, BURRITO only supports the approach from the first version of PICRUSt to do automatic function attribution calculation. If you are using non-Greengenes IDs, then you will also need to provide either a genomic content table for your taxa, or a pre-calculated contribution table, which PICRUSt2 can generate.

Hope that helps!

Best, Alex

restivve commented 4 years ago

Hi,

Sorry I might have sent the wrong file. I used green genes with picrust, but I have also run the taxonomic hierarchy in SILVA which I’ve used previously in phyloseq.

For the purpose of analyzing the meta genome predictions, the abundance data was run through PICRUSt2 and used green genes.

Does that help? Victoria

On Thu, Feb 6, 2020 at 4:06 PM engal notifications@github.com wrote:

Hi,

I think you may have sent the wrong file for your taxonomic abundances. The file you sent appears to be a taxonomic hierarchy (i.e. defining taxonomic classifications for your sequences).

However, given the taxonomic hierarchy you sent, it appears that you are not using Greengenes IDs. Currently, BURRITO only supports the approach from the first version of PICRUSt to do automatic function attribution calculation. If you are using non-Greengenes IDs, then you will also need to provide either a genomic content table for your taxa, or a pre-calculated contribution table, which PICRUSt2 can generate.

Hope that helps!

Best, Alex

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/borenstein-lab/burrito/issues/7?email_source=notifications&email_token=AKRP6SHLD63Z7IMRHLYHXLTRBR3U7A5CNFSM4KJGIOJ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELAY6LQ#issuecomment-583110446, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKRP6SG73RRI3US6SKHNEUTRBR3U7ANCNFSM4KJGIOJQ .

-- Victoria Restivo MSc Candidate Aquatic Toxicology Biology [image: Map icon] location: LSB 208 https://www.mcmaster.ca/welcome/campusmap.cfm [image: Phone icon] phone: 289-230-1800 <+12892301800> [image: Envelope icon] email: restivve@mcmaster.ca

[image: McMaster University Brighter World logo]

engal commented 4 years ago

Hi Victoria,

The error you received suggests that the taxonomic IDs you are using are DNA sequences, whereas Greengenes taxonomic IDs are numbers (e.g. 367523 is the Greengenes ID for kBacteria|pBacteroidetes|cFlavobacteriia|oFlavobacteriales|fFlavobacteriaceae|gFlavobacterium). Your taxonomic abundance table will need to the use the number form of the Greengenes ID.

I also just noticed that in the metagenome-based function abundances you sent earlier, you're using EC numbers for function IDs. When BURRITO uses the approach from the first version of PICRUSt for automatic attribution calculation, it will generate abundances for KOs, not EC numbers. If you still plan on using the automatic attribution calculation, you'll need your metagenome-based function abundance file to also use KO IDs. If you want to view abundances for EC numbers, you'll need to take a few additional steps.

Best, Alex

MelanieF1 commented 4 years ago

Hello,

Thank you very much for your answer and I apologize for the delayed response!

In fact, doing what you suggested did'nt work either for a simple reason. One of the file was really large and I didn't wait long enough for it to upload on your app.

As soon as I waited long enough (namely: the name of the file disappeared due to the graphic bug), it was OK and worked fine ! It would be helpful for us if something would warn us that we have to wait for the complete loading of our files and/or prevent the burrito assembly if it's not the case !

Thank you again for your help

Regards,

Mélanie

engal commented 4 years ago

Hi Melanie,

Thanks for the update. That behavior is still a bit odd. BURRITO should be waiting to start processing data until all files are successfully uploaded (I think what you suggested as preventing BURRITO if the file has not finished uploading), but since that doesn't seem to be working, I'll look into that. The one thing we are missing is a notification that a file is still in the process of being uploaded to the server, so I'll also look into adding that.

Best, Alex