LSSTDESC / desc-gen3-prod

Desc-prod wrapper for pipeline production using gen3_workflow.
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

g3wf-strace not found in tasks for job 1191 #6

Closed dladams closed 8 months ago

dladams commented 10 months ago

Job 1191 reported success but the task logs look like this:

 login18>cat submit/u/dladams/isr/20231103T162527Z/logging/e7ff3af6-4a7b-4063-95d0-58ab0484e3e1_isr_250_176.stderr
--> executable follows <--
g3wf-strace --freq=1 --args="-t -T -C -f -e open,openat,close,rename,renameat,renameat2,link,linkat,unlink,unlinkat" perf stat -d ${CTRL_MPEXEC_DIR}/bin/pipetask --long-log --log-level=VERBOSE run-qbb /pscratch/sd/d/dladams/repo-1085 /pscratch/sd/d/dladams/descprod-out/jobs/job001191/submit/u/dladams/isr/20231103T162527Z/u_dladams_isr_20231103T162527Z.qgraph --qgraph-node-id e7ff3af6-4a7b-4063-95d0-58ab0484e3e1 && >&2 echo success || (>&2 echo failure; false)
--> end executable <--
/bin/bash: g3wf-strace: command not found
failure
--> executable follows <--
g3wf-strace --freq=1 --args="-t -T -C -f -e open,openat,close,rename,renameat,renameat2,link,linkat,unlink,unlinkat" perf stat -d ${CTRL_MPEXEC_DIR}/bin/pipetask --long-log --log-level=VERBOSE run-qbb /pscratch/sd/d/dladams/repo-1085 /pscratch/sd/d/dladams/descprod-out/jobs/job001191/submit/u/dladams/isr/20231103T162527Z/u_dladams_isr_20231103T162527Z.qgraph --qgraph-node-id e7ff3af6-4a7b-4063-95d0-58ab0484e3e1 && >&2 echo success || (>&2 echo failure; false)
--> end executable <--
/bin/bash: g3wf-strace: command not found
failure

There are a few issues here:

  1. The error message appears in the log twice.
  2. The job appeared to succeed. The status message is: "Done. Workflow complete: 94/94 tasks.". But if I run a status job on the directory, I get the result: "Done. Finished 0 of 94 tasks. (94 failed.)"
  3. Parsl did not find g3wf-strace even though desc-gen3-prod built successfully.
dladams commented 8 months ago

The script g3wf-strace is now replaced with another that should not bu used so this problem is no longer relevant.