bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
643 stars 85 forks source link

feedback from user on amigeous errors #4171

Open wdbaruni opened 2 days ago

wdbaruni commented 2 days ago

some excellent product feedback from a user:

bacalhau docker run --id-only /tmp/lilypad/data/bacalhau-job-specs/QmTYMpWKFakH7vTdsHEVwgbDyrxQrEHY29SnDX43k8bWLj/job.json  gives me the following output:
30070bcd-94a7-4fe0-9916-a85aba7ad046

Error submitting job: not enough nodes to run job. requested: 1, available: 3, suitable: 0.
• 3 of 3 nodes: job already executed on this node more than once
Job Results By Node:
• Node QmPLPUUj, QmbxGSsM, QmSD38wH, QmSD38wH, QmPLPUUj, QmbxGSsM: Could not inspect image "/tmp/lilypad/data/bacalhau-job-specs/QmTYMpWKFakH7vTdsHEVwgbDyrxQrEHY29SnDX43k8bWLj/job.json" - could be due to repo/image not existing, or registry needing authorization: Error response from daemon: invalid reference format: repository name (tmp/lilypad/data/bacalhau-job-specs/QmTYMpWKFakH7vTdsHEVwgbDyrxQrEHY29SnDX43k8bWLj/job.json) must be lowercase. execution failed
$ cat /tmp/lilypad/data/bacalhau-job-specs/QmTYMpWKFakH7vTdsHEVwgbDyrxQrEHY29SnDX43k8bWLj/job.json 

{"APIVersion":"V1beta1","Metadata":{"CreatedAt":"0001-01-01T00:00:00Z","Requester":{}},"Spec":{"Engine":"Docker","EngineSpec":{"Type":"","Params":null},"PublisherSpec":{"Type":"Ipfs"},"Docker":{"Image":"richbrem/cowsay","Entrypoint":["/usr/local/bin/cowsay","Anythung"]},"Wasm":{"EntryModule":{}},"Resources":{"GPU":""},"Network":{"Type":"None"},"Timeout":1800,"Deal":{"Concurrency":1}}}

and a follow up question: `Error submitting job: not enough nodes to run job. requested: 1, available: 3, suitable: 0.
• 3 of 3 nodes: job already executed on this node more than once`

How is this happening? I'm running bcalhau` privately and I only have one node running, who are these other 3 nodes? These are the commands I used (both with the same output): bacalhau serve --node-type compute,requester --peer none --private-internal-ipfs=false --job-selection-accept-networked and bacalhau serve --node-type compute,requester --peer none

Errors that we can help with:

  1. Word “repository” is ambiguous - is it our repo? docker repo? etc
  2. Does the image exist? Can we be more specific here?
  3. The job schema is wrong - can we give more detailed feedback
  4. They’re handing a file to docker run can we warn that this is not supported?
  5. Why do they have “three nodes”?
  6. Can we catch/convert the v1beta1 spec to v1beta2 spec? and output/warn them?