borenstein-lab / burrito

A visualization tool for exploratory data analysis of metagenomic data
https://elbo-spice.gs.washington.edu/shiny/burrito/
GNU General Public License v3.0
35 stars 9 forks source link

Constantly disconnecting from BURRITO server #18

Open safiraloebis opened 3 years ago

safiraloebis commented 3 years ago

Hi, I've been able to place the correct inputs based on picrust2 outputs. However, when inputting data, the server runs at step 2 for some time, then abruptly moves into step 4, already disconnected. I have to reupload data and try again, but it always fails.

I've ran this through my Linux (4GB) and an online sever (32GB). My internet connection is stable throughout the processing. Does anyone have an idea of what could be up? Is there a limit to the amount of data BURRITO can process? Thanks a bunch.

engal commented 3 years ago

This may be due to an input formatting issue that we're currently not properly handling. Would it be possible for you share the first 5 or so lines of each input file you are using?

safiraloebis commented 3 years ago

This may be due to an input formatting issue that we're currently not properly handling. Would it be possible for you share the first 5 or so lines of each input file you are using?

Thanks for the reply. Here are .txt files of the first 6 rows, and parameters of the data I input. It seems that they match with the example data shown in the tool.

  1. Taxonomic abundances: created by QIIME2 (TSV) otu_5samples.txt

  2. Taxonomy: created by the SILVA classifier plugin of QIIME2 (TSV) taxonomy_5samples.txt

  3. Minimum taxonomic resolution: Family

  4. Metagenome-based function abundances: through Picrust2 full pipeline (TSV) ec_abundance_unstrat_5samples.txt

  5. Taxonomy-function linking method (Function Attribution Table): through Picrust2 full pipeline (TSV) ec_abundance_contrib_5samples.txt

  6. Function hierarchy: DEFAULT (no data input)

  7. Minimum functional resolution: Superpathway

  8. Sample Grouping: grouped by 'Yield'. This is using my manifest file and therefore contains file location. Have had no problem using this with other tools. metadata.txt

engal commented 3 years ago

I think the issue is due to having a "-" in your column names for the metadata file (e.g. in "sample-id" and "absolute-filepath"). Could you remove that character, try uploading your data again, and then let me know what happens?

safiraloebis commented 3 years ago

I think the issue is due to having a "-" in your column names for the metadata file (e.g. in "sample-id" and "absolute-filepath"). Could you remove that character, try uploading your data again, and then let me know what happens?

I've just tried it out and the same thing still happens. I've also tried removing the absolute-filepath column too.

engal commented 3 years ago

Ok, I've identified another issue, though you should still make sure to remove any "-" characters in your metadata table column names. It looks like you're using EC numbers for your function IDs, but BURRITO is only designed to handle KO IDs under default settings. If you want to use EC numbers, you'll need to either provide your own custom hierarchy of function IDs and categories or use the "no hierarchy" option for function IDs. Depending on how many unique functions are present, you may find the latter option to not be very helpful, as it will display all functions at once and may make the visualization uninterpretable.

safiraloebis commented 3 years ago

Ok, I've identified another issue, though you should still make sure to remove any "-" characters in your metadata table column names. It looks like you're using EC numbers for your function IDs, but BURRITO is only designed to handle KO IDs under default settings. If you want to use EC numbers, you'll need to either provide your own custom hierarchy of function IDs and categories or use the "no hierarchy" option for function IDs. Depending on how many unique functions are present, you may find the latter option to not be very helpful, as it will display all functions at once and may make the visualization uninterpretable.

Thanks, I see. I'll give it a try. I've used EC as input because I was unable to find any KO-based taxonomy-function linking method file generated by the picrust2 pipeline. Am I mistaken? Also, I would like to make sure that EC hierarchies are only available online, therefore I would have to manually create my own file? Hope it's not a hassle to ask, I'm quite new to this.

Update: I have succeeded in using the tool with the no hierarchy option to test. The tool now takes its time in Step 1, unlike the sudden jump to Step 2, 4 and a disconnection previously. Would be great if I could get the KO data working.

engal commented 3 years ago

You should be able to generate KO profiles if you use the "-i KO" option during PICRUSt2's hidden-state prediction step (use "-i KO" when running hsp.py).

I'm not aware of any pre-existing hierarchies for EC numbers, so you would have to create your own. Given this, I would strongly recommend using KO data.

If you want to speed up the data upload process, I would recommend compressing your stratified function contribution table using gzip before uploading.

Hope that helps!