bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
671 stars 87 forks source link

Running a compute node on the same machine as a serving bacalhau node gives faulty error #4458

Open aronchick opened 1 day ago

aronchick commented 1 day ago
❯ bacalhau-1.5.0 serve --compute --api-port 2233 \
daaronch@M2-Max ~/code/bacalhau-versions
 >      -c "compute.labels=name=compute1"
11:51:40.122 | WRN cmd/cli/serve/serve.go:146 > --name flag with value  ignored. Name n-346f5e00-0d32-4ec7-a410-f82fcca05fd4 already exists
11:51:40.142 | INF cmd/cli/serve/serve.go:218 > starting bacalhau...
11:51:40.145 | INF pkg/publisher/local/server.go:52 > Running local publishing server on 0.0.0.0:6001 [NodeID:n-346f5e00]
11:51:40.145 | INF pkg/lib/watcher/watcher.go:59 > No checkpoint found, starting from latest [NodeID:n-346f5e00] [watcher_id:compute-logger]
11:51:40.145 | INF pkg/lib/watcher/watcher.go:68 > starting watcher [NodeID:n-346f5e00] [starting_at:latest] [watcher_id:compute-logger]
11:51:40.145 | ERR pkg/publisher/local/server.go:59 > error running local publishing server error="listen tcp 0.0.0.0:6001: bind: address already in use" [NodeID:n-346f5e00]
11:51:42.444 | INF cmd/cli/serve/serve.go:271 > bacalhau node running [address:0.0.0.0:2233] [capacity:"{CPU: 8.399999999999999, Memory: 48 GB, Disk: 50 GB, GPU: 0}"] [compute_enabled:true] [engines:["docker","wasm"]] [name:n-346f5e00-0d32-4ec7-a410-f82fcca05fd4] [orchestrator_enabled:false] [orchestrators:["nats://127.0.0.1:4222"]] [publishers:["noop","local"]] [storages:["urldownload","inline"]] [webui_enabled:false]

Note the "already bound to 6001"

frrist commented 19 hours ago

This will happen when two compute nodes are run on the same host. By default, each compute node runs a local publisher that binds to 0.0.0.0:6001. I assume you have started two compute nodes on this instance, and the error occurs when starting the second instance? If that is the case this can be mitigated the same way conflicts on the API mitigated - by configuring the default publisher to use a different port, i.e.

$ bacalhau serve --node-type=compute --api-port=2233 --repo=~/.compute --config=publishers.types.local.port=6002
aronchick commented 10 hours ago

That's fair but we had these instructions in the game day docs - I wonder if there's a better experience here. If this happens is the compute node functional?