Simplified User-Facing Dataset Collection Model

jmchilton commented 8 years ago

There is a bunch of complexity related to dataset collection HDAs - for instance they may or may not appear elsewhere in the history (not just datasets with different HDAs - but the same HDA). This results in terrifying consequences such as a user being able to delete an HDA and affecting a collection that is seemingly unrelated to them. I implemented this but I was just doing what the Trello card spec'ed out - the design was someone else's and I largely don't regret that the models are flexible enough to allow these combinations - but under normal UI-driven use things almost certainly should be simpler.

I am going to layout here how I think it "should" work - i.e. how it UI-driven interactions should operate. I am hoping to get some consensus on this.

An HDA should not ever appear both in a history at the top-level and inside an HDCA.
- Copying collections between histories or inside a history should copy each HDA.
- The collection creators should result in duplicated HDAs being added to the new HDCA - there should be an option to hide or delete the original HDAs.
- Mapping over a collection should result in one collection appearing for each output in the history - not top-level HDAs that are hidden later as it currently works.
- When hid is used in tool actions - something else should be used - probably <hdca_id>:(<element_id>:*):<element_id> -> e.g. 1:sample_x:forward. Element IDs are preserved like tags instead of growing like names - so this works more like HIDs.
- Deleting a collection via the UI should delete all the HDAs in the collection. Purging a collection should work the same way.
- Deleting a collection should result in all related jobs being cancelled.
[X] HDCA structure should be write-once (semi-immutable). Once hdca.collection.populated is True, those are the HDAs that belong to the collection forever - those HDAs may change but the contents of the HDCA will now.

jmchilton commented 8 years ago

Ping @carlfeberhard - I know you have expressed frustration that collections don't operate this way currently. Does this list capture how you would like collections to operate? Before I start hacking on the backend I'd like to reach some consensus on how it should work.

carlfeberhard commented 8 years ago

I believe so (so far at least). Some questions though:

Where will HDAs that are part of HDCAs 'be contained'? Will they still have a history_id?
Will existing HDCAs be converted to the new structure/relationship or will we have to support two structures?
What do you mean by 'semi-immutable'? Do you mean that no HDAs can be removed or added, but the metadata and attributes of the HDAs are still editable (name, etc.)?

jmchilton commented 8 years ago

Where will HDAs that are part of HDCAs 'be contained'? Will they still have a history_id?

They won't have an hid but they will have a history_id I guess, if we can make that work. child datasets maybe used to work this way?

Will existing HDCAs be converted to the new structure/relationship or will we have to support two structures?

I don't think we can realistically change existing histories. My preferred answer to this question would be something like... we generally assume and build features just around the new assumption (I mean histories already may be setup this way - so it is something we currently support right?) but we don't explode on the legacy histories? Like if there is counter-intuitive things for existing histories or confusing GUI behavior - we just suggest people copy the non-hidden stuff to a new, clean history rather than handling the old entities.

Admittedly I don't understand in what ways the differences affect the GUI though - I guess I was hoping this would be a dialog about that.

What do you mean by 'semi-immutable'? Do you mean that no HDAs can be removed or added, but the metadata and attributes of the HDAs are still editable (name, etc.)?

Essentially yes - I think. Though I'm now wondering if collction.populated is a per-level entity - it is possible that subcollections have their own populated state. Maybe a weaker statement like - the number of entities in a populate collection won't change, the element order and identifiers won't change once set, the IDs of the references HDAs won't be modified (there is no API for any of that - I don't think it should be added). I have been pretty consistent that I think collections should be used as homogeneous entities (by users for instance) but that the backend doesn't enforce this. If you/we want to go further with the GUI for instance and not expose dataset renaming or extension modification at the HDA level in collections and instead provide collection-wide options for doing this that would be absolutely fantastic IMO.

nsoranzo commented 8 years ago

:+1: I have a tool that can create a list collection of thousands of datasets, the fact they are also hidden HDAs makes it impossible to load the history in Firefox and very slow in Chrome.

jmchilton commented 7 years ago

I thought we had at least solved copying a collection also copying the HDAs but it sounds like this may not be the case according to @mvdbeek. I'll try to write up some test cases for this.

galaxyproject / galaxy

Simplified User-Facing Dataset Collection Model #1810