Run this job: bacalhau docker run --concurrency=4 ubuntu:latest eho hello world (notice echo is misspelled, meaning its an invalid binary to execute)
Observe this result:
Job successfully submitted. Job ID: 69d52def-06b1-4e81-a70d-043df8d292f6
Checking job status... (Enter Ctrl+C to exit at any time, your job will continue running):
Communicating with the network ................ done ✅ 0.1s
Creating job for submission ................ done ✅ 0.0s
Job in progress ................ err ❌ 0.6s
Error submitting job: not enough nodes to run job. requested: 1, available: 4, suitable: 0.
• 4 of 4 nodes: job already executed on this node more than once
Job Results By Node:
• Node QmVHCeiL, Qma5yQAk, Qma5yQAk, QmafZ9oC, QmRr9qPT: execution error: failed to start container: executable file not found: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "eho": executable file not found in $PATH: unknown. execution failed
• Node QmVHCeiL, QmRr9qPT, QmafZ9oC:
Accepted job. not enough nodes to run job. requested: 1, available: 4, suitable: 0.
• 4 of 4 nodes: job already executed on this node more than once
Expected Behavior
I would expect:
the error to state that the job requested 4 nodes, not 1.
Not seeing 4 of 4 nodes: job already executed on this node more than once - this is the first time I ran the job. I suspect this is due to a retry strategy kicking in and trying it twice on every node.
I would more clear error message that states it tried to run the job on 4 nodes and failed to run it on any of them due to
Bacalhau Versions
Agent Version: v1.3.1
CLI Client Version: v1.3.1
Host Environment
Provide details about the environment where the bug occurred:
Operating System: Ubuntu
CPU Architecture: x86
Any other relevant environment details: this was run in staging cluster
Job Specification
(If applicable, provide the job spec used when the issue occurred.)
Bug Description
Run this job:
bacalhau docker run --concurrency=4 ubuntu:latest eho hello world
(notice echo is misspelled, meaning its an invalid binary to execute) Observe this result:Expected Behavior
I would expect:
4 of 4 nodes: job already executed on this node more than once
- this is the first time I ran the job. I suspect this is due to a retry strategy kicking in and trying it twice on every node.Bacalhau Versions
v1.3.1
v1.3.1
Host Environment
Provide details about the environment where the bug occurred:
Job Specification
(If applicable, provide the job spec used when the issue occurred.)
Logs
Node Logs: https://gist.github.com/frrist/69a9c85891890e114f235aece40ed888