Closed robinengler closed 3 years ago
Hi @robinengler thanks for reporting this -- unfortunately, pipes are currently not supported in the renku
CLI: https://renku-python.readthedocs.io/en/latest/cli.html#detecting-standard-streams
Perhaps @jirikuncar can elaborate/comment on the problems with brackets?
@robinengler : as a workaround, please tell us if any of the following works:
echo 'date|cat' | renku run bash
The point is, that the exec
of the process must fully capture the execution from start to end.
Thanks @rokroskar and @fgeorgatos for the reply and the suggested workarounds. I have tried the second option where echo the command and then pipe the whole thing into renku run. Here is an example of what I tried:
echo "sort -k2 -n data/processed/joined_batch1.txt | grep -v FAIL" | renku run bash > data/processed/sorted_batch1.txt
The command now works, but I think that this still breaks renku's tracking of the workflow, because if I now look at the .cwl file produced for this step, it's kind of empty:
cat .renku/workflow/9e4848d238dd4433a5ea60be6915a7a2_bash.cwl
arguments: []
baseCommand:
- bash
class: CommandLineTool
cwlVersion: v1.0
hints: []
inputs: {}
outputs:
output_stdout:
streamable: false
type: stdout
permanentFailCodes: []
requirements: []
stdout: data/processed/sorted_batch1.txt
successCodes: []
temporaryFailCodes: []
So it looks like the actual command to process the file was not recorded properly. This seems to be confirmed by the fact that, when I then try to regenerate the output file with:
renku rerun data/processed/sorted_batch1.txt
The ouput file is now empty (so it knows it should create this ouput fle, but doesn't know how to create it).
From this I think that if one wants to use pipes in renku, then embedding the bash commands in a shell script is the way to go :-)
I now also tried the other option, where the command (that contains a pipe) is embedded into a shell script. It seems best if the shell script is written so that both the input and output files are passed to the script (rather than say auto-detected by the script), so that renku can properly recognize the input and output files.
It is then possible to run the script with renku:
renku run notebooks/sortFile.sh data/processed/joined_batch1.txt data/processed/sorted_batch1.txt
The commands works well, and looking at the .cwl file the input and output files are properly identified. There is still a problem though when I tried do a "renku rerun". For some reason, renku does not find the script anymore:
renku rerun data/processed/sorted_batch1.txt
/home/jovyan/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/models/provenance/activities.py:597: YAMLLoadWarning:
*** Calling yaml.load() without Loader=... is deprecated.
*** The default Loader is unsafe.
*** Please read https://msg.pyyaml.org/load for full details.
process = CWLClass.from_cwl(yaml.load(data))
/home/jovyan/.local/pipx/venvs/renku/lib/python3.6/site-packages/renku/models/provenance/activities.py:597: YAMLLoadWarning:
*** Calling yaml.load() without Loader=... is deprecated.
*** The default Loader is unsafe.
*** Please read https://msg.pyyaml.org/load for full details.
process = CWLClass.from_cwl(yaml.load(data))
Resolved '.renku/workflow/b09b7815a5694042b5b9464913a6ed2c.cwl' to 'file:///home/jovyan/testproject2/.renku/workflow/b09b7815a5694042b5b9464913a6ed2c.cwl'
[workflow ] start
[workflow ] starting step step_1
[step step_1] start
[job step_1] /tmp/tmpl6mx8lzt$ join \
--check-order \
--header \
'-t ' \
-1 \
1 \
-2 \
1 \
/tmp/tmpdgol84z4/stgfe8f6c0e-4777-4adb-af18-a33ce08ad594/testData_batch1_a.txt \
/tmp/tmpdgol84z4/stg7c22c132-c9a5-4a0e-b0e0-8daaee717837/testData_batch1_b.txt > /tmp/tmpl6mx8lzt/data/processed/joined_batch1.txt
[job step_1] completed success
[step step_1] completed success
[workflow ] starting step step_2
[step step_2] start
[job step_2] /tmp/tmpccu9tqsg$ notebooks/sortFile.sh \
/tmp/tmpiwiys_qz/stg3ab66fd2-e02f-40e0-8bc0-b925d28c7e1d/joined_batch1.txt \
data/processed/sorted_batch1.txt
'notebooks/sortFile.sh' not found
[job step_2] completed permanentFail
[step step_2] Output is missing expected field file:///home/jovyan/testproject2/.renku/workflow/b09b7815a5694042b5b9464913a6ed2c.cwl#step_2/output_0
[step step_2] completed permanentFail
[workflow ] completed permanentFail
Ahhhhhhhh! You have found a bug. 🐞
@robinengler apologies for letting this issue sit idle for so long - the problem you describe (where the executable is also a dependency) has recently been resolved (see https://github.com/SwissDataScienceCenter/renku-python/issues/495) - could you try updating your renku version and rerun the command?
If you are running this inside a jupyter notebook on renkulab you can update it by either building the image from renku/singleuser:latest
or upgrading it inside the running notebook server (see https://renku.readthedocs.io/en/latest/user/cli-installation.html#upgrading). If you are using it on your machine the upgrade process will depend on how you installed it - let me know if you need help.
Closed due to inactivity and since we don't plan on supporting pipes (and there's no clear direction on how we could support pipes)
Hi renku team, I tried a couple of very simple bash commands with renku to test it. But it seems that each time a command has a bash pipe in it, renku run fails and complains about the directory being "dirty" when clearly it is not the case.
Here is an example:
If I now try an almost identical command, but without a pipe in it, it works as expected.
I tried a couple of other commands, and each time there is a pipe in it, it fails with the same message saying "Error: The repository is dirty." when it fact it's clean.
Here is a very simple reproducible example showing the problem:
Am I doing something wrong in my commands, or is this behaviour expected ? Thanks, Robin
P.S. From my limited testing, brackets in commands also have deleterious effects on renku run: the two commands below fail (with different errors).