galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.41k stars 1.01k forks source link

Simplified User-Facing Dataset Collection Model #1810

Open jmchilton opened 8 years ago

jmchilton commented 8 years ago

There is a bunch of complexity related to dataset collection HDAs - for instance they may or may not appear elsewhere in the history (not just datasets with different HDAs - but the same HDA). This results in terrifying consequences such as a user being able to delete an HDA and affecting a collection that is seemingly unrelated to them. I implemented this but I was just doing what the Trello card spec'ed out - the design was someone else's and I largely don't regret that the models are flexible enough to allow these combinations - but under normal UI-driven use things almost certainly should be simpler.

I am going to layout here how I think it "should" work - i.e. how it UI-driven interactions should operate. I am hoping to get some consensus on this.

jmchilton commented 8 years ago

Ping @carlfeberhard - I know you have expressed frustration that collections don't operate this way currently. Does this list capture how you would like collections to operate? Before I start hacking on the backend I'd like to reach some consensus on how it should work.

carlfeberhard commented 8 years ago

I believe so (so far at least). Some questions though:

jmchilton commented 8 years ago

Where will HDAs that are part of HDCAs 'be contained'? Will they still have a history_id?

They won't have an hid but they will have a history_id I guess, if we can make that work. child datasets maybe used to work this way?

Will existing HDCAs be converted to the new structure/relationship or will we have to support two structures?

I don't think we can realistically change existing histories. My preferred answer to this question would be something like... we generally assume and build features just around the new assumption (I mean histories already may be setup this way - so it is something we currently support right?) but we don't explode on the legacy histories? Like if there is counter-intuitive things for existing histories or confusing GUI behavior - we just suggest people copy the non-hidden stuff to a new, clean history rather than handling the old entities.

Admittedly I don't understand in what ways the differences affect the GUI though - I guess I was hoping this would be a dialog about that.

What do you mean by 'semi-immutable'? Do you mean that no HDAs can be removed or added, but the metadata and attributes of the HDAs are still editable (name, etc.)?

Essentially yes - I think. Though I'm now wondering if collction.populated is a per-level entity - it is possible that subcollections have their own populated state. Maybe a weaker statement like - the number of entities in a populate collection won't change, the element order and identifiers won't change once set, the IDs of the references HDAs won't be modified (there is no API for any of that - I don't think it should be added). I have been pretty consistent that I think collections should be used as homogeneous entities (by users for instance) but that the backend doesn't enforce this. If you/we want to go further with the GUI for instance and not expose dataset renaming or extension modification at the HDA level in collections and instead provide collection-wide options for doing this that would be absolutely fantastic IMO.

nsoranzo commented 8 years ago

:+1: I have a tool that can create a list collection of thousands of datasets, the fact they are also hidden HDAs makes it impossible to load the history in Firefox and very slow in Chrome.

jmchilton commented 7 years ago

I thought we had at least solved copying a collection also copying the HDAs but it sounds like this may not be the case according to @mvdbeek. I'll try to write up some test cases for this.