galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 991 forks source link

Composite data type for ISA-Tab + plus data in the context of metabolomics (mass spec and NMR) #3134

Open pcm32 opened 7 years ago

pcm32 commented 7 years ago

At the PhenoMeNal project we are working towards making large scale computation for metabolomics possible on cloud providers. We are working to make use of Galaxy at the user interface layer, on top of our stack. In this context, we are in need of a proper way of accessing ISA-Tab metadata files and accompanying data from Galaxy. We are hoping to contribute an ISA-Tab composite data type in the context of metabolomics (so, the composite would manage files like mzML, nmrML, etc.). Our current approach is just to hide all of these from Galaxy by putting files inside a tar/zip, this if course far from being the proper solution.

ISA lends itself very well for composite data types, as there is a single initial file (i_investigation file) from which the whole directory structure (s, a, maf and data files) can be explained (including assignment of defined experimental factor values to data files).

Initially our concerns for the use case are as follows:

Formats that we should support initially:

We have found so far some interesting examples, like this in proteomics, besides the commonly cited one for genetics. Any other good example that we should be aware of?

At this point we are after feedback from the Galaxy community and core developers on how to build this in the best possible way (most compliant with what Galaxy expects). I'm tagging people as requested: @dannon. More collaborators will add more comments as I share this link (I don't seem to be able to tag them here, I guess you can only tag project members).

pkrog commented 6 years ago

Hi @bgruening , I don't know if you had time to look at my hack. Anyway I have another concern. It's about the uploader. In case the uploaded file is not a valid ISA archive I send an exception. However my exception message is not displayed. I've tried also to send messages on stdout and stderr and on the logger on the different channel, but nothing is shown. I've looked at other composite datatypes, but I found none that fails on wrong file submission, all dataset obtained are in green state. Is this the default behaviour to implement? This seems strange to me, but having the dataset in a red state with no clear message to the user is a bit strange too. What would you suggest?