PitchInteractiveInc / Phinch

Phinch is an open-source framework for visualizing biological data, funded by a grant from the Alfred P. Sloan foundation. This project represents an interdisciplinary collaboration between Pitch Interactive, a data visualization studio in Oakland, CA, and biological researchers at UC Riverside.
http://phinch.org/
BSD 2-Clause "Simplified" License
149 stars 30 forks source link

Using biom formatted file for COG/KEGG data (from picrust) #40

Open passdan opened 9 years ago

passdan commented 9 years ago

When I try loading the biom file, it passes the import and filter stage, showing all samples and metadata and passes to the visualisation stage showing the chart options.

If I click any of them it infinitely loads (1 hour+) without any error message. Has anyone successfully loaded files of this type?

hollybik commented 9 years ago

Hi Daniel, we hadn't specifically tested out COG/KEGG data. If it passes the parser window (e.g. all your samples and metadata are loading), then you're on the right track with the file. Can you tell me how your taxonomy (ontology?) strings are parsed? I suspect that Phinch is expecting the ontology/taxonomy information to be in the QIIME format preceded by k, p, etc (kingdom, phylum, etc. followed by two underscores). Can you try to reformat your taxonomy/ontology strings in that format and see if it works. The BIOM strings should look like this (examples from 16S data):

{"id": "OTU_128216", "metadata": {"taxonomy": ["kBacteria", "pProteobacteria", "cBetaproteobacteria", "oBurkholderiales", "f__Comamonadaceae"]}}

akknight commented 9 years ago

Hi!

I'm having this issue too. The output is the KEGG pathway bio table from PICRUSt (using OTU tables from QIIME), so it isn't able to be formatted like above, but instead is in KEGG pathways like:

[{"id": "1,1,1-Trichloro-2,2-bis(4-chlorophenyl)ethane (DDT) degradation", "metadata": {"KEGG_Pathways": ["Metabolism", "Xenobiotics Biodegradation and Metabolism", "1,1,1-Trichloro-2,2-bis(4-chlorophenyl)ethane (DDT) degradation"]}},{"id": "ABC transporters", "metadata": {"KEGG_Pathways": ["Environmental Information Processing", "Membrane Transport", "ABC transporters"]}}

hollybik commented 9 years ago

Phinch is still a prototype framework, which means we had to constrain the ontology formatting in the initial version of the software. I expect that the files are hanging because of the KEGG formatting - you could try to reformat by changing "KEGG_Pathways" to "taxonomy" (and you might need to add the p, c, etc. pre-formatting before the term), to see if that corrects the problem. I'm not sure whether the ID formatting would also cause a problem - Phinch might be anticipating an OTU_number format.

I'll update this thread when we start specifically updating the code to handle KEGG formatting.