galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.42k stars 1.01k forks source link

Getting rid of the hid ? #6781

Open mvdbeek opened 6 years ago

mvdbeek commented 6 years ago

I have a radical suggestion here, maybe we could try to get rid of static HIDs. I think they were mostly useful in identifying input datasets using the ${on_string} pattern, but in practice "concatenatate datasets 456, 876 and 11284" isn't super-helpful and can also be deduced using the job input HDAs. This would allow us to inject more or less information based on the context (quick summary in history panel, more details when you hover over the dataset ), and removes one reason why job caching needs to follow some fairly strict rules to take effect.

So the ${on_string} parameter for instance could remain a placeholder in the name attribute of an HDA or the unexpanded label could be stored in an additional field, like name_template, and then we fill in the name when displaying the HDA.

eschen42 commented 6 years ago

I'm not sure what you mean here, but FYI: Many of the tools that I use (such as Query Tabular) name the resulting datasets things like "query on datasets 376 and 127", so that I can immediately see which datasets were the source from which my data were pulled. I don't want to have to click "circle i" and navigate to figure out what I currently can figure out from the name of the dataset.

mvdbeek commented 6 years ago

Like I said, we don't need to hardcode this into the name, the same thing can be dynamically generated and would allow moving parts of a history to another history and still make sense.

I wouldn't remove this without anything at least equally good in place.

jmchilton commented 5 years ago

I had missed this but a huge 👍 for making this at least an option from me. We need to do this for workflows being evaluated outside the context of Galaxy. If the workflow execution context doesn't have access to the Galaxy database we can't realistically generate HIDs. Such workflow executions are on the roadmap and a team priority.

nsoranzo commented 3 years ago

I guess this could also help when copying an output dataset between histories, where the hid in the ${on_string} of the copied dataset presently refers to the hid of datasets from the original history.