galaxy-iuc / standards

Documentation for standards and best practices from the Galaxy IUC
http://galaxy-iuc-standards.readthedocs.io/en/latest/
6 stars 16 forks source link

Change "Tool on data N" output file naming to something without spaces? #53

Open mblue9 opened 6 years ago

mblue9 commented 6 years ago

I've just run into a few errors from the "Tool on data N" naming e.g. https://github.com/galaxyproject/tools-iuc/pull/1842 that turned out to be very helpful for identifying I needed to handle spaces better in the wrapper. But having spaces in filenames is not something to be encouraged imho (and I'm well aware that I'm one of the people currently propagating this!). I know spaces in input filenames could be replaced in the tool wrappers, but I was wondering if the recommended default naming could be something without spaces? And maybe include the element_identifier instead of (or in addition to) on_string?

For example, in the mageck case could I change:

${tool.name} on ${on_string}: sgRNA Counts

to something like below (with underscore or dot or something-not-a-space separating):

${tool.name}_${element_identifier}_sgRNACounts

Although if the tool.name has spaces they'd also need replacing, or could the tool.id be used instead.

nsoranzo commented 6 years ago

These are not file names, but dataset names. The actual file names of the outputs created by Galaxy are always of the form dataset_N.dat . The usual rule of always surrounding user-controlled inputs with single quotes is the correct solution in my opinion, see https://github.com/galaxyproject/tools-iuc/pull/1842/files#r183253619 .

blankenberg commented 6 years ago

Just another bit of a warning about using 'fancy' output dataset names. It can often be helpful, but one should consider what happens if you end up stringing together multiple tools that take advantage of this feature and don't use the standard ${on_string}, but make use of input. name or element_identifier. You can end up with some rather large and unwieldy dataset names.