Closed jennaj closed 3 years ago
It's unclear to me what's broken here but apparently the Perl error is not the problem.
One thing that stands out: why does the tool wrapper copy its inputs? Are symlinks not sufficient?
cp '/galaxy-repl/main/files/038/407/dataset_38407948.dat' 'Prokka on data 5: gff.gff' && cp '/galaxy-repl/main/files/038/411/dataset_38411630.dat' 'Prokka on data 11: gff.gff' && roary -f out -p ${GALAXY_SLOTS:-1} -e -n -i '95' -cd '99.0' -g '50000' -t '11' -iv '1.5' 'Prokka on data 5: gff.gff' 'Prokka on data 11: gff.gff'
@takadonet any thoughts? Looks like you originally implemented the input handling.
Reason being is that Roary will follow the softlink and use that file name instead of the soft link name. All names would be dataset_###
Ahhh gotcha. Blech, ok, thanks.
Ah, it's not handling spaces in the input filenames:
Error: Cant access file /galaxy-repl/main/jobdir/027/303/27303938/working/Prokka
Error: Cant access file /galaxy-repl/main/jobdir/027/303/27303938/working/Prokka
Probably. That is my mistake assuming that file name would be command line friendly.
They're quoted, though, so I think roary is not reading those params correctly?
Roary cannot handle them.
@bgruening did you fix this manually somehow on usegalaxy.eu?
I don't think so.
(venv) galaxy@sn04:~/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/roary/e02e9af2743f/roary$ hg diff
(venv) galaxy@sn04:~/shed_tools/toolshed.g2.bx.psu.edu/repos/iuc/roary/e02e9af2743f/roary$
Summary:
Workarounds for end-users working with individual datasets:
- If executing tools from the History: Click on the pencil icon for an input
gff
dataset to reach the Edit Attributes forms. On the first tab, modify the file name, removing any spaces, then save. Do this for allgff
inputs to avoid the naming problem. Rerun Roary using those renamed inputs.- If executing tools from a Workflow: The output
gff
dataset generated by an upstream tool (likely Prokka) can be renamed to remove spaces as a "post job action" within the Workflow itself. This will pass the renamedgff
inputs to Roary and avoid the naming problem.Notes
The upstream tool commonly used (Prokka), when executed in Galaxy on individual datasets, will always insert spaces into the result dataset names.
When Prokka is executed with a collection input, spaces in dataset names are avoided from the start. Collections and workflows are worth learning about. If interested, please see:
Update:
If an intermediate parsing job fails, the tool outputs empty "green" results. This is confusing for users. Seems to be more likely to happen with a large number of gff inputs but that isn't confirmed.
Can sub-job tasks that fail be trapped better in the wrapper? Could failed sub-jobs be ignored or rerun? If just a few sub-jobs fail, maybe allow the user to chose to ignore and have what was skipped output to a job log shown in the history? At a minimum, if all outputs will be empty, red error dataset results would be better.
Example discussion: https://help.galaxyproject.org/t/roary-core-genome-alignment-file-is-empty/4054/13
Tool version is now at 3.13.0+galaxy2
Hello everyone,
I ran Roary for 219 genomes, but in the presence/absence matrix I only have 197 genomes.
Does anyone know the reason ?
Tool:
Roary the pangenome pipeline - Quickly generate a core gene alignment from gff3 files (Galaxy Version 3.13.0)
Workaround for end-users: Until the tool is corrected at usegalaxy.org and this ticket closes out, it can be used instead at usegalaxy.eu.
Troubleshooting: seems to have three problems
Test histories: use some of the tutorial data from here: https://training.galaxyproject.org/training-material/topics/assembly/
Error for the usegalaxy.org test. Is the same as reported at Galaxy Help here: https://help.galaxyproject.org/t/roary-fatal-error-exit-code-2/3164
ping @davebx @mvdbeek @natefoo