bacalhau-project / bacalhau

Compute over Data framework for public, transparent, and optionally verifiable computation
https://docs.bacalhau.org
Apache License 2.0
706 stars 89 forks source link

failed to send update info to requester node error="failed to get nodestate during node registration: nodeInfo not found for nodeID" #4694

Closed chungyan5 closed 2 weeks ago

chungyan5 commented 2 weeks ago

Bug Description

I am a newbie, I try to setup one requester and one Compute Node following Create Network, but come out the above error message from Compute Node side.

Expected Behavior

Requester(s) and Compute Node(s) can communicate.

Steps to Reproduce

  1. i have TWO linux ubuntu servers in a same network.
  2. install bacalhau (1.5.1) in both
  3. setup token
  4. run Requester and Compute Node
  5. Requester is running smooth, Compute Node comes out the above error msg.
  6. For your more information: i setup a node as Requester and Compute in same computer, then it works, i can submit the job to run.
  7. So, i may be communication issue between two nodes.

Bacalhau Versions

Host Environment

Provide details about the environment where the bug occurred:

Job Specification

(If applicable, provide the job spec used when the issue occurred.)

Logs

Requester Logs:

bacalhau serve --orchestrator 00:45:53.786 | INF cmd/cli/serve/serve.go:103 > Config loaded from: [/home/easystore/.bacalhau/config.yaml], and with data-dir /home/easystore/.bacalhau 00:45:53.787 | INF cmd/cli/serve/serve.go:181 > Starting bacalhau... 00:45:54.835 | INF cmd/cli/serve/serve.go:256 > bacalhau node running [address:0.0.0.0:1234] [compute_enabled:false] [name:n-adc8ea97-fcbc-4efb-9ccc-f23040349a7d] [orchestrator_address:0.0.0.0:4222] [orchestrator_enabled:true] [webui_enabled:false]

To connect to this node from the local client, run the following commands in your shell: export BACALHAU_API_HOST=127.0.0.1 export BACALHAU_API_PORT=1234

A copy of these variables have been written to: /home/easystore/.bacalhau/bacalhau.run 00:54:02.637 | WRN pkg/orchestrator/planner/logging_planner.go:48 > Job failed [Details:{"IsError":"true","NodesAvailable":"0","NodesRequested":"1","NodesSuitable":"0"}] [EvalID:ba18a0d8-cc76-4d71-8ebe-c885652934b5] [Event:"not enough nodes to run job. requested: 1, available: 0, suitable: 0."] [JobID:j-fdaebcd3-2a71-40e1-b9e9-52b601e56ae0] [NodeID:n-adc8ea97]

Compute Node Logs:

bacalhau serve --compute --config Compute.Orchestrators=192.168.1.58 11:03:55.117 | INF cmd/cli/serve/serve.go:103 > Config loaded from: [/home/easystore/.bacalhau/config.yaml], and with data-dir /home/easystore/.bacalhau 11:03:55.117 | INF cmd/cli/serve/serve.go:181 > Starting bacalhau... 11:03:56.157 | INF cmd/cli/serve/serve.go:256 > bacalhau node running [address:0.0.0.0:1234] [capacity:"{CPU: 16.80, Memory: 94 GB, Disk: 671 GB, GPU: 0}"] [compute_enabled:true] [engines:["docker","wasm"]] [name:n-389eb261-e61b-47cf-91f1-a621e198cd25] [orchestrator_enabled:false] [orchestrators:["192.168.1.58"]] [publishers:["local","noop"]] [storages:["urldownload","inline"]] [webui_enabled:false]

To connect to this node from the local client, run the following commands in your shell: export BACALHAU_API_HOST=127.0.0.1 export BACALHAU_API_PORT=1234

A copy of these variables have been written to: /home/easystore/.bacalhau/bacalhau.run 11:04:56.146 | ERR pkg/compute/management_client.go:117 > failed to send update info to requester node error="failed to get nodestate during node registration: nodeInfo not found for nodeID: n-389eb261-e61b-47cf-91f1-a621e198cd25" [NodeID:n-389eb261]

linear[bot] commented 2 weeks ago

ENG-309 failed to send update info to requester node error="failed to get nodestate during node registration: nodeInfo not found for nodeID"

chungyan5 commented 2 weeks ago

hi all,

I tried my another network with another computers, it works. Let myself to investigate the 1st network and 1st set of computers issue. Thks.