filecoin-project / lotus

Reference implementation of the Filecoin protocol, written in Go
https://lotus.filecoin.io/
Other
2.82k stars 1.25k forks source link

Logs show: WARN frame executor queue is backlogged #11920

Open stuberman opened 2 months ago

stuberman commented 2 months ago

Checklist

Lotus component

Lotus Version

Daemon: 1.26.3+mainnet+git.560826d5c+api1.5.0
Local: lotus-miner version 1.26.3+mainnet+git.560826d5c

rustc 1.77.2 (25ef9e3d8 2024-04-09)
rustup 1.27.0 (bbb9276d2 2024-03-08)
go version go1.21.9 linux/amd64

free -h
              total        used        free      shared  buff/cache   available
Mem:          125Gi        43Gi       864Mi        10Mi        81Gi        81Gi
Swap:         199Gi        73Mi       199Gi

Repro Steps

  1. Start Lotus-miner
  2. See logs

Describe the Bug

After upgrading to lotus v1.26.3 from 1.25 seeing these WARN errors in miner logs continuously

Logging Information

2024-04-23T19:16:32.188Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 130, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 131, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 132, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 133, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 134, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 134, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 132, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 130, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 129, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 130, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 131, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 132, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 133, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 134, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 134, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 131, "cap": 256}
2024-04-23T19:16:32.282Z    WARN    rpc go-jsonrpc@v0.3.1/websocket.go:676  frame executor queue is backlogged  {"queued": 129, "cap": 256}
rvagg commented 1 month ago

@stuberman @TippyFlitsUK does this mainly happen on startup, or is this a constant thing? Is there a particular time or focal event that this happens around, or just all the time? ISTM that the backlog is caused by the client not processing the messages as quick as the server sends them, but maybe that's just a quick burst of activity focused around startup or some other event?

stuberman commented 1 month ago

@rvagg this is happening all of the time, it is constant.

2024-06-04T17:25:19.958Z WARN rpc go-jsonrpc@v0.3.1/websocket.go:676 frame executor queue is backlogged {"queued": 136, "cap": 256} 2024-06-04T17:25:19.958Z WARN rpc go-jsonrpc@v0.3.1/websocket.go:676 frame executor queue is backlogged {"queued": 137, "cap": 256} 2024-06-04T17:25:19.958Z WARN rpc go-jsonrpc@v0.3.1/websocket.go:676 frame executor queue is backlogged {"queued": 138, "cap": 256} 2024-06-04T17:25:19.958Z WARN rpc go-jsonrpc@v0.3.1/websocket.go:676 frame executor queue is backlogged {"queued": 139, "cap": 256}

rvagg commented 1 month ago

I've put up an experiment that might help with debugging this @ https://github.com/filecoin-project/go-jsonrpc/pull/113, I don't really want to push that through to a release yet but I'd really like to see the output.

If someone could run their node/miner (whatever is producing these logs) with this version installed and then share logs it might be useful.

To install it, in a lotus repo checkout, on the tag corresponding to the version you're running, do this:

go get github.com/filecoin-project/go-jsonrpc@9e83a043c6959e53e8521d7d48c7c80e26da5695
go mod tidy

and rebuild

stuberman commented 1 month ago

I will give this a shot after my PoSTs later this morning.

stuberman commented 1 month ago

Not seeing the rpc warning now.

git show

commit 2baca01fd8d8cac837800fbd8dfd39d96a407783 (HEAD, tag: v1.27.0, origin/releases) Merge: 560826d5c 60a39dd1e Author: Phi-rjan orjan.roren@gmail.com Date: Tue May 28 14:17:05 2024 +0200

Merge pull request #12058 from filecoin-project/resolve-releasev1270-conflicts

chore: merge release/v1.27.0 branch into the releases branch

lotus-miner version

Daemon: 1.27.0+mainnet+git.2baca01fd.dirty+api1.5.0 Local: lotus-miner version 1.27.0+mainnet+git.2baca01fd.dirty

2024-06-05T13:04:09.076Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 11, "forRound": 3976209, "baseEpoch": 3976208, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263092, "lookbackEpochs": 900, "networkPowerAtLookback": "26217018616095277056", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null} 2024-06-05T13:04:39.079Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 12, "forRound": 3976210, "baseEpoch": 3976209, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263102, "lookbackEpochs": 900, "networkPowerAtLookback": "26217161243369242624", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null} 2024-06-05T13:05:09.337Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 11, "forRound": 3976211, "baseEpoch": 3976210, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263112, "lookbackEpochs": 900, "networkPowerAtLookback": "26217161243369242624", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null} 2024-06-05T13:05:39.448Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 12, "forRound": 3976212, "baseEpoch": 3976211, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263122, "lookbackEpochs": 900, "networkPowerAtLookback": "26217155745811103744", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null} 2024-06-05T13:06:09.310Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 11, "forRound": 3976213, "baseEpoch": 3976212, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263132, "lookbackEpochs": 900, "networkPowerAtLookback": "26217178079641042944", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null} 2024-06-05T13:06:39.314Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 9, "forRound": 3976214, "baseEpoch": 3976213, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263142, "lookbackEpochs": 900, "networkPowerAtLookback": "26217178079641042944", "minerPowerAtLookback": "1372112684056576", "isEligible": true, "isWinner": false, "error": null} 2024-06-05T13:07:09.316Z INFO miner miner/miner.go:505 completed mineOne {"tookMilliseconds": 12, "forRound": 3976215, "baseEpoch": 3976214, "baseDeltaSeconds": 9, "nullRounds": 0, "lateStart": false, "beaconEpoch": 8263152, "lookbackEpochs": 900,

stuberman commented 1 month ago

Here are all the events since I modified lotus 11 hours ago. Far less frequent.

cat /pool/logs/miner.log |grep backlogged f01278.txt

rvagg commented 1 month ago

hah, well that's very interesting, thanks @stuberman!

The primary reason for the lack of warnings is that I bumped the warning threshold from 1/2 to 2/3rds, which suggests that it's normally clearing them in good time, which is good, and means this warning is probably unnecessarily noisy. But perhaps also moving some of the processing earlier in the pipeline helps too.

The most interesting thing is the method<> in there, for all of them! I need to work out what that means, I'm obviously not clear on the details of the chatter but I'd assumed that each frame needed a method but I guess there's a bunch of meta frames in there I need to grok. There might be some unnecessary chatter going on. I'll investigate.

magik6k commented 1 month ago

Empty method likely means call result