galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.42k stars 1.01k forks source link

Workflow invocation fails when collections input is selected at runtime #5632

Open jennaj opened 6 years ago

jennaj commented 6 years ago

Three related issues: None have been added to Roadmap for triage yet.

BUG Most important item to fix.

USAGE ENHANCEMENTS found during testing of the bug

ping @guerler @jmchilton

Fixes

Testing overview and links to test workflows and result histories

Datasets used as run-time inputs:

Workflows and result histories:

Collection input not connecting:

Collection input connecting when "type" is reset ONE TIME:

Collection input NOT connecting when "type" is reset MORE THAN ONE TIME (bug):

Only individual datasets can be selected at runtime (no multiple, no collection):

Collection or individual datasets selected at runtime:

Graphic of History list for indiv vs coll select at runtime:

screen shot 2018-03-01 at 12 11 13 pm
bernt-matthias commented 6 years ago

I just got a report from a user that seems to be equal to the 1st issue. Would be interesting to know if there is a workaround for the mean time.

jennaj commented 6 years ago

@bernt-matthias The solution is to use an Inputs within the workflow and set it to the correct collection type the downstream tool is expecting. Configure that downstream tool for the expected input type before as needed. Then the noodles will connect and the workflow will run normally.

Selecting collections as input at runtime is the core issue we will be addressing. Making usage more clear will be secondary priorities (enhancements) and are more complex to address but the help above should get your workflow going. In short, use Inputs, and configure them correctly, instead of selecting inputs at runtime. The user still has a choice of which actual input datasets to choose - the Inputs just defines which datasets are appropriate to use.

A single dataset will have one Inputs (can be a single dataset or a dataset collection).

Multiple datasets that are not in a collection will have multiple Inputs.

Any of the tool/steps can be annotated to guide the user in making proper choices. I often label these with the datatype and sometimes the target genome (if that is already set for some of the tools).

Example1: If a tool has a specific target genome already set, and the input is a BAM dataset, I'd label the input with "BAM for dbkey" where "dbkey" is the genome (hg38, mm10, some custom build, etc).

Example2: If a there are forward and reverse reads, I would include that info in the Inputs annotation as well, so the correct file is selected.

jennaj commented 6 years ago

@jmchilton @guerler @dannon Did we decide that using an Inputs within the workflow for collections is now required? And it will be that way going forward?

If so, we help users that encounter this issue as it comes up.

The other fixes could be considered enhancements. We could move those to a ticket specifically for "New" -- Should I do that and close this out?

jennaj commented 5 years ago

Related ticket: https://github.com/galaxyproject/galaxy/issues/7431

An enhancement for workflows should still probably be added. Users do not understand why there is no output from a workflow when no "input" is included at the start, in particular, a collection input. The workflow launches but doesn't produce any output. Is reported several times a week now.