galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.38k stars 999 forks source link

Concatenate tool is not collection aware #3172

Open peterjc opened 7 years ago

peterjc commented 7 years ago

First to create a simple collection dataset:

  1. Log on to usegalaxy.org
  2. Create new history
  3. Click upload data
  4. Click "Paste/Fetch data", change type to "txt", enter text "Alpha", click "Start", wait
  5. Repeat with text "Beta"
  6. Repeat with text "Gamma"
  7. Click "Close" on the upload popup to return to main Galaxy window
  8. Wait until history has finished updating with three "Pasted Entry" green datasets
  9. Click the "tick icon" (Operations on multiple datasets)
  10. Tick all three "Pasted Entry" datasets
  11. Click "For all selected ...", "Build Dataset List"
  12. Now popup "Create a collection from a list of datasets" appears
  13. Give it a name "Three text snippets"
  14. Click "Create list", nothing happens
  15. Click the three entries so they turn black
  16. Click "Create list", new entry appears in history
  17. Click the "tick icon" again to return to normal mode

OK, we now have a simple collection dataset containing three text files.

  1. On the left hand pane, click on "Text Manipulation", "Concatenate datasets (tail-to-head)"
  2. Under the "Concatenate Dataset" prompt click the "folder (Datasets collection)" icon
  3. Notice our "Three text snippets" collection is selected (good).
  4. Notice the message under this, "This is a batch mode input field. Separate job will be triggered for each dataset selection" (bad).
  5. Click "Exectute"

Actual result:

New collection containing output of three separate concatenations each on one file only. i.e. A copy of the input collection.

Desired result:

New single text file (not a collection) containing concatenation of the "Alpha", "Beta" and "Gamma" snippets (in that order).

peterjc commented 7 years ago

As of https://github.com/galaxyproject/galaxy/blob/71cea6604d43c5fe6215f5656462ba6c1af69bb6/tools/filters/catWrapper.xml this uses:

    <command interpreter="python">
        catWrapper.py
        $out_file1
        $input1
        #for $q in $queries
            ${q.input2}
        #end for
    </command>
    <inputs>
        <param name="input1" type="data" label="Concatenate Dataset"/>
        <repeat name="queries" title="Dataset">
            <param name="input2" type="data" label="Select" />
        </repeat>
    </inputs>

Does this mean it is related to #697, and should be re-written as:

    <command interpreter="python">
        catWrapper.py
        $out_file1
        #for $q in $queries
            ${q.input2}
        #end for
    </command>
    <inputs>
        <repeat name="queries" min="1" title="Dataset(s)">
            <param name="input2" type="data" label="Select" />
        </repeat>
    </inputs>

or, better use a multiple="true" entry?

    <command interpreter="python">
        catWrapper.py
        $out_file1
        #for $f in $input1
            '$f'
        #end for
    </command>
    <inputs>
        <param name="input1" type="data" multiple="true" label="Concatenate Dataset(s)"/>
    </inputs>
jmchilton commented 7 years ago

I'd recommend using @bgruening's text processing tools - see https://github.com/bgruening/galaxytools/blob/master/tools/text_processing/text_processing/cat.xml. People are unsure how to proceed with this - but I think people I've talked to generally prefer these text processing tools and would prefer to encourage use of them over improving the tools shipped with Galaxy.

peterjc commented 7 years ago

I'd agree with that if we deprecate and hide the bundled cat1 tool since it is obsolete and recommend @bgruening tool instead? Otherwise it remains a default installed tool, highly visible but of limited use.

jennaj commented 7 years ago

The two cat tools behave differently even with single dataset inputs. Have we decided which to keep?

For many of these updated text mani tools, including these two - they do not bundle into older version vs newer one ("Version" is reported but older versions not listed for either). Probably due to tool provenance details. Not sure if that is an issue or not or if one could be mapped to the other another way - for those that have the older one in a workflow.

martenson commented 3 years ago

We could explicitly link to Bjoern's concat from the help section of the tool.