etclabscore / core-geth

A highly configurable Go implementation of the Ethereum protocol.
https://etclabscore.github.io/core-geth
GNU Lesser General Public License v3.0
267 stars 147 forks source link

Node does not commit new sealing work when it should #586

Closed Ewemek closed 9 months ago

Ewemek commented 9 months ago

Hi, I'm a pool operator and we've just updated from v1.12.12 to v1.12.16. We now face a problem where the node commits new sealing work very rarely. The pool is calling getWork() very often but still gets the same work, until a new one is committed. As you can see in the logs, the node did not generate a work for heights 18,717,898 to 18,717,890 (included) although we did ask for a new job in the meantime.

We have enough peers (~170) and relevant options we use are: --mine --miner.threads 0. We recently dropped the --miner.notify maybe this is related.

But after investigation, I feel that our problem is related to this commit: https://github.com/etclabscore/core-geth/commit/6f51aed27141413b9a7b66145fa8e278fc3bff89. To be more specific, I have the impression that the mining process get stopped but never gets started back. In particular, the shouldStart is never actually used, I guess this is a regression but I can't know for sure it's related to my problem.

System information

Geth version: v1.12.16

Expected behaviour

New work should be committed at least when a block for a new height is processed.

Actual behaviour

Node is not committing new works when it should.

Steps to reproduce the behaviour

Run v1.12.16 with following options: --mine --miner.threads 0

Backtrace

INFO [11-14|17:21:04.354] Imported new chain segment               number=18,717,894 hash=fbe224..f95c04 blocks=1 txs=0   mgas=0.000 elapsed=2.961ms     mgasps=0.000   dirty=4.84MiB    af=true
INFO [11-14|17:21:18.530] Imported new chain segment               number=18,717,895 hash=38b2d9..7dd1c0 blocks=1 txs=29  mgas=0.610 elapsed=9.715ms     mgasps=62.791  dirty=4.93MiB    af=true
INFO [11-14|17:21:21.638] Imported new chain segment               number=18,717,896 hash=332502..f59f5b blocks=1 txs=0   mgas=0.000 elapsed=12.938ms    mgasps=0.000   dirty=4.89MiB    af=true
WARN [11-14|17:21:42.303] Synchronisation failed, dropping peer    peer=a6545b0537e3e2400880d59825a419bb3abd4a04d6055fb5c3c89d9815ceb676 err=timeout
INFO [11-14|17:21:42.304] Commit new sealing work                  number=18,717,897 sealhash=bd9d3a..043d3b uncles=0 txs=0   gas=0         fees=0            elapsed="739.113µs"
INFO [11-14|17:21:42.310] Commit new sealing work                  number=18,717,897 sealhash=1cf66e..942e20 uncles=0 txs=12  gas=252,000   fees=0.000735     elapsed=6.656ms
INFO [11-14|17:21:43.643] Mining aborted due to sync
INFO [11-14|17:21:47.682] Imported new chain segment               number=18,717,897 hash=954b52..47294d blocks=1 txs=0   mgas=0.000 elapsed=2.845ms     mgasps=0.000   dirty=4.85MiB    af=true
INFO [11-14|17:21:56.469] Imported new chain segment               number=18,717,898 hash=c4632f..4699ee blocks=1 txs=0   mgas=0.000 elapsed=2.819ms     mgasps=0.000   dirty=4.85MiB    af=true
INFO [11-14|17:22:04.066] Imported new chain segment               number=18,717,899 hash=c2fd74..daa781 blocks=1 txs=12  mgas=0.252 elapsed=17.999ms    mgasps=14.000  dirty=4.91MiB    af=true
INFO [11-14|17:22:28.543] Imported new chain segment               number=18,717,900 hash=3327db..e1053e blocks=1 txs=0   mgas=0.000 elapsed=13.850ms    mgasps=0.000   dirty=4.91MiB    af=true
WARN [11-14|17:22:43.644] Synchronisation failed, dropping peer    peer=ff16b1eae2bb305e9c96220c723ef8ff6bd341bba9ab3ca417bdd400fcd1d1f0 err=timeout
INFO [11-14|17:22:43.644] Commit new sealing work                  number=18,717,901 sealhash=7fad2e..ca6016 uncles=0 txs=0   gas=0         fees=0            elapsed="162.161µs"
INFO [11-14|17:22:43.647] Commit new sealing work                  number=18,717,901 sealhash=90c2ce..ccfbff uncles=0 txs=24  gas=1,146,810 fees=0.001879239  elapsed=3.463ms
INFO [11-14|17:22:46.465] Imported new chain segment               number=18,717,901 hash=dd31fe..e2c4cd blocks=1 txs=24  mgas=1.149 elapsed=10.907ms    mgasps=105.337 dirty=5.06MiB    af=true
INFO [11-14|17:22:46.466] Commit new sealing work                  number=18,717,902 sealhash=e8846d..b87950 uncles=0 txs=0   gas=0         fees=0            elapsed="364.916µs"
INFO [11-14|17:22:46.467] Commit new sealing work                  number=18,717,902 sealhash=71cf8f..e02d7e uncles=0 txs=3   gas=64312     fees=6.4312e-05   elapsed=1.555ms
INFO [11-14|17:22:47.105] Mining aborted due to sync
Ewemek commented 9 months ago

Well ! I've tried a few things, such as restoring --miner.notify and trying every version between v1.12.12 and v1.12.16. Ultimately I noticed that the problem did not happen when I waited a few minutes after stopping the node, before starting it again. The other notable change is that instead of having 0 peers, and struggling to get some (it took like 6+ hours to get only a few), now I have more than 50 in less than 1 minute after starting the node.

So now I feel that the problem is closely related to peering, maybe the rate limiting system needs tweaking ?

meowsbits commented 9 months ago

Thanks for the thoughtful report! I'll look at it right away and get back to you.

meowsbits commented 9 months ago

@Ewemek I've identified what I believe to be the problem and am drafting a patch.

Ultimately I noticed that the problem did not happen when I waited a few minutes after stopping the node, before starting it again.

This is the workaround I would have offered, and with it you shouldn't see any more spurious pauses. Only a minute's pause or so should do.

If you'd like, please email me at isaac@etccooperative.org and we can continue this conversation directly and in more depth.