galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.37k stars 992 forks source link

Reproduciblity and Large Dataset Collections BoF #4265

Open jmchilton opened 7 years ago

jmchilton commented 7 years ago

Several of us met at GCC2017 to discuss this topic, with a focus on two different representations of collections of data - the Hyper-browser representation and the stock representation.

I made the following list of topics that sort of wanted to track and potentially follow up on - it wasn't really meeting notes so I apologize if I'm missing particular contributions to the discussion. Feel free to jump in and fill in details and new discussion.

I'll keep this issue opened until the conversation dies and then maybe link out to concrete action issues.

pvanheus commented 7 years ago

Please expand on the references to URI in this?

sandve commented 7 years ago

I will be happy to follow up on these ideas when I am back from vacation in early August! I also believe several other people from the Oslo group will be happy to join in. As mentioned at the BoF, we have quite a lot of experience of how such information and representation is useful in various analytical settings. We know less about the experiences achieved with the current Dataset List solution in Galaxy, and what are the main plans of the Galaxy team in this direction. As said at the conference, we would be very happy to try to contribute towards a best possible solution!

Regarding the question about URIs, the idea is to have a standard way of representing a multiplicity of datasets, where each dataset would be represented by a URI that could e.g. be the URL of a bed file.