bxlab / galaxy-hackathon

Data intensive science for everyone.
https://galaxyproject.org/
Other
7 stars 2 forks source link

Optional dataset naming based on original fastq files #13

Open bwlang opened 8 years ago

bwlang commented 8 years ago

I'd like to implement an optional default naming convention that includes

*tool name* on *original input dataset*

@mvdbeek thinks this could be implemented with the element identifier field being talked about: https://github.com/galaxyproject/galaxy/issues/2006 https://github.com/galaxyproject/galaxy/issues/2140

Perhaps as a next phase we could find a way to designate an input dataset or collection as a "naming source"

I'd like to find a way to make this work in the context of a workflow, but also in the context of ad-hoc analysis.

NickSto commented 8 years ago

So, the quick and dirty way to do this would be to simply add an attribute to HistoryDatasetAssociation called sample. On creation, it would use the sample of the parent, or if there is none, the parent's name.

Do we want to just go the quick and dirty route for now, and then more properly solve the problem later? James mentioned at the hackathon that UI features in the future should group history items in ways that'll make the sample association obvious. But that sounds like it's a bit down the road.

Note: Dataset names are created in lib/galaxy/tools/actions/__init__.py.

NickSto commented 8 years ago

Second thought: instead of actually filling in the sample attribute automatically, we could just get the parent(s)' sample(s) by walking up the family tree when needed, such as when creating the history item name. We can just leave the sample empty unless the user actually fills it in.

NickSto commented 8 years ago

Okay, just one more idea: instead of even having a sample attribute, we could simply walk the tree, finding the first history items that are ancestors of the current one. Then, use their names as the sample names for the current history item. Usually the names of history items with no parent are filenames or urls we can get a reasonable sample name from.

We could also add a flag like name_from_user indicating whether the name has been edited by the user. If it has, we can assume that's a good sample name. So, we could walk upward until we hit the first edited name, and stop there.

Of course, this is all just in case there's no collection element_identifier we can use, which should take precedence.

This system would be even simpler, however it is getting even quicker and dirtier.