filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.86k stars 1.27k forks source link

Several CLI commands failed to connect to lotus-miner #7072

Closed William8Work closed 3 years ago

William8Work commented 3 years ago

Checklist

Lotus component

lotus miner - mining and block production

Lotus Version

Daemon:  1.11.1-rc2+mainnet+git.40449f1cc+api1.2.0
Local: lotus-miner version 1.11.1-rc2+mainnet+git.40449f1cc

Describe the Bug

After upgraded to v1.11.1-rc2, I tried to run cli commands on worker nodes:

lotus-miner info
lotus-miner storage-deals list
lotus-miner sealing jobs
lotus-miner sealing workers
lotus-miner sectors list --fast

These commands run successful in miner node. However, in the worker nodes (separate machines) encountered the following:

these commands failed:

lotus-miner info
lotus-miner storage-deals list

but these commands works:

lotus-miner sealing jobs
lotus-miner sealing workers
lotus-miner sectors list --fast

The worker machine has the proper MINER_API_INFO env set up so the lotus-miner sealing jobs and other commands are able to success. However, lotus-miner info and lotus-miner storage-deals list failed.

Logging Information

$ lotus-miner info
ERROR: could not get API info: repo directory does not exist. Make sure your configuration is correct

Repo Steps

  1. Run lotus-miner info command in lotus miner machine as well as a worker machine.
  2. The command will success in miner machine but failed in worker machine.
Angelo-gh3990 commented 3 years ago

I can confirm the same issue on :

Daemon: 1.11.1-rc3+mainnet+git.56c35ff1e+api1.2.0 Local: lotus-miner version 1.11.1-rc3+mainnet+git.56c35ff1e

running lotus daemon on separate machine

command's run fine on rc1

6enno commented 3 years ago

I confirm similar issue on m1.3.5

$ lm info
ERROR: malformed HTTP response "\x13/multistream/1.0.0"
$ lm sealing workers
Worker 7b055dae-02c8-40e2-83ef-6cee421802d2, host hectorb
    CPU:  [                                                                ] 0/16 core(s) in use
    RAM:  [                                                                ] 1% 5.533 GiB/377.6 GiB
    VMEM: [                                                                ] 0% 5.533 GiB/632.6 GiB
    GPU: GeForce RTX 3090, not used
Worker 84960bea-960d-48c8-b799-1bbee101f4a3, host HectorA
    CPU:  [                                                                ] 0/16 core(s) in use
    RAM:  [||||||                                                          ] 10% 12.73 GiB/125.8 GiB
    VMEM: [||                                                              ] 3% 12.73 GiB/381.8 GiB
    GPU: GeForce RTX 2080 Ti, not used
$ lm version
Daemon:  1.11.1-m1.3.5+mainnet+git.3ff8e256b+api1.2.0
Local: lotus-miner version 1.11.1-m1.3.5+mainnet+git.3ff8e256b
dayou5168 commented 3 years ago

Guys, i reply in your slack thread. myabe you can try the solution I suggested

jennijuju commented 3 years ago

@William8Work could you please run lotus-miner -vv info and send us any error warning log that potentially appears?

jennijuju commented 3 years ago

I can confirm the same issue on :

Daemon: 1.11.1-rc3+mainnet+git.56c35ff1e+api1.2.0 Local: lotus-miner version 1.11.1-rc3+mainnet+git.56c35ff1e

running lotus daemon on separate machine

command's run fine on rc1

Is it not working on the miner node or worker node?

jennijuju commented 3 years ago

Getting below error on the main miner node on v1.11.1-rc3

running : lotus-miner info
ERROR: could not get API info: could not get api endpoint: API not running (no endpoint)
jennijuju commented 3 years ago

From @William8Work

confirmed @jennijuju - PLFD. It works on miner nodes but workers nodes fails. Also, while lotus-miner info and lotus-miner storage-deals list fails, but lotus-miner sealing workers and lotus-miner sectors list success in the worker nodes.

William8Work commented 3 years ago

Running this on worker nodes:

$ lotus-miner -vv info

using raw API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
using miner API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
ERROR: could not get API info: repo directory does not exist. Make sure your configuration is correct

Running the same command in miner node: $ lotus-miner --vv info

using raw API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
using miner API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
using raw API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
using markets API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
using raw API v0 endpoint: ws://10.1.18.166:1234/rpc/v0
using full node API v0 endpoint: ws://10.1.18.166:1234/rpc/v0
Enabled subsystems (from miner API): [Mining Sealing SectorStorage Markets]
Enabled subsystems (from markets API): [Mining Sealing SectorStorage Markets]
Chain: [sync ok] [basefee 136.919 pFIL]
using raw API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
using miner API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
Miner: f08399 (32 GiB sectors)
Power: 270 Ti / 9.14 Ei (0.0028%)
        Raw: 252.7 TiB / 9.134 EiB (0.0026%)
        Committed: 253 TiB
        Proving: 252.7 TiB
Projected average block win rate: 2.84/week (every 59h9m5s)
Projected block win with 99.9% probability every 408h34m31s
(projections DO NOT account for future network and miner growth)

Miner Balance:    3132.572 FIL
      PreCommit:  214.716 mFIL
      Pledge:     2333.248 FIL
      Vesting:    732.939 FIL
      Available:  66.171 FIL
Market Balance:   5.77 FIL
       Locked:    2.617 FIL
       Available: 3.153 FIL
Worker Balance:   322.543 FIL
       Control:   276.033 FIL
Total Spendable:  667.9 FIL

Sectors:
        Total: 8376
        Proving: 8110
        WaitSeed: 1
        Committing: 8
        Removed: 257

Storage Deals: 352, 6.624 TiB
      Active:  351  6.593 TiB (Verified: 130 2.251 TiB)
      Sealing: 1    32 GiB    (Verified: 1   32 GiB)

Retrieval Deals (complete): 9, 170 GiB
$ lotus-miner version
Daemon:  1.11.1-rc2+mainnet+git.40449f1cc+api1.2.0
Local: lotus-miner version 1.11.1-rc2+mainnet+git.40449f1cc
Angelo-gh3990 commented 3 years ago

on my miner node :

miner:~# lotus-miner -vv info using raw API v0 endpoint: ws://10.10.10.140:2345/rpc/v0 using miner API v0 endpoint: ws://10.10.10.140:2345/rpc/v0 ERROR: could not get API info: could not get api endpoint: API not running (no endpoint)

Angelo-gh3990 commented 3 years ago

netstat -an : process is running on that port : tcp 0 0 0.0.0.0:2345 0.0.0.0:* LISTEN

Angelo-gh3990 commented 3 years ago

other command:

miner:~# lotus-miner -vv sectors list --fast using raw API v0 endpoint: ws://10.10.10.140:2345/rpc/v0 using miner API v0 endpoint: ws://10.10.10.140:2345/rpc/v0 using raw API v0 endpoint: ws://10.10.10.101:42002/rpc/v0 using full node API v0 endpoint: ws://10.10.10.101:42002/rpc/v0 ID State OnChain Active Deals 0 Proving YES YES CC

seems to connect just fine on that port

Angelo-gh3990 commented 3 years ago

I had set : LOTUS_MARKETS_PATH, was from a test a while back after unsetting/removing it : unset LOTUS_MARKETS_PATH it works on my miner

nonsense commented 3 years ago

@William8Work

~1. Could you explain what you mean with worker nodes? It seems like you are running lotus-miner commands, and not lotus-worker info for example.~

2.

Running this on worker nodes:

$ lotus-miner -vv info

using raw API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
using miner API v0 endpoint: ws://10.1.18.180:2345/rpc/v0
ERROR: could not get API info: repo directory does not exist. Make sure your configuration is correct

~In the error message we see repo directory does not exist, so I guess LOTUS_MINER_PATH or LOTUS_MARKETS_PATH is pointing at a location that does not exist, or they are not set.~

~Overall I am a bit confused as it is not clear what your setup is - are you running MRA (miner node + markets node) and then individual worker nodes for various sealing operations?~

Having read the Slack thread, I now understand that you are running lotus-miner CLI commands on your lotus-worker nodes, without running MRA in split mode (i.e. lotus-miner is handling all subsystems - mining, sealing, proving, markets).

@William8Work could you confirm that all 3 API_INFO env vars are setup correctly? In order to interact with lotus-miner info, the command needs access to the markets subsystem and to the proving/storage subsystems and to a full node, so you need 3 env vars, for example:

MARKETS_API_INFO=token:/ip4/127.0.0.1/tcp/2345/http
MINER_API_INFO=token:/ip4/127.0.0.1/tcp/8787/http
FULLNODE_API_INFO=token:/ip4/127.0.0.1/tcp/1234/http

Having debugged this, we should further improve the error messages, because ERROR: could not get API info: repo directory does not exist. Make sure your configuration is correct is rather confusing in this case.

William8Work commented 3 years ago

@nonsense ok, since my worker nodes already have MINER_API_INFO and FULLNODE_API_INFO, I added the MARKETS_API_INFO env. and now the lotus-miner info and lotus-miner storage-deals commands works in worker nodes!!

the only small change to your advice above is that I set the market api exactly same value as miner api (same IP address, same port):

MARKETS_API_INFO=token:/ip4/127.0.0.1/tcp/2345/http
MINER_API_INFO=token:/ip4/127.0.0.1/tcp/2345/http
FULLNODE_API_INFO=token:/ip4/127.0.0.1/tcp/1234/http
nonsense commented 3 years ago

Now that https://github.com/filecoin-project/filecoin-docs/pull/1012 and https://github.com/filecoin-project/lotus/pull/7088 are merged, I think we can close this.

For now miners have to specify all environment variables in order to connect to a remote miner, and in order for lotus-miner info and other CLI commands to work as expected.


Lotus CLI configuration will get a revamp in the near future, when we plan on simplifying it and unifying it at one place.