bitcoin-core / bitcoincore.org

Bitcoin Core project website
https://bitcoincore.org/
MIT License
608 stars 474 forks source link

Bug "[net_processing.cpp:4381] [CheckForStaleTipAndEvictPeers] Potential stale tip detected" causing indexers to stall Bitcoin for extremely long time #893

Closed verdy-p closed 2 years ago

verdy-p commented 2 years ago

Stale chain tips can block the RPC and all other responses from the peer for very long time (more than 15 minutes, sometimes hours, or undefinetely). During that time I can see that my local node no longer replies to ANY other peer (e.g. to bitnode.io core). And in process viewers, I can see that there's a single thread blocking ALL other threads in a tight loop using 100% CPU in a tight loop.

I've seen that this is occurs in any db indexer (notably the chainstate indexer during IBD, or the txindexer, or the coinstat db indexer), notably in some critical steps where new blocks are added to the chain (once every about 15 minutes). This seems to happen when a new block contains a transaction validating a better chain, and invalidating other blocks from dead branches. In that case, the ongoin indexer will fail with unchecked conditions in its internal iterator, apparently because it expect the next block to be present and locks it incorrectly, and is not able to release it and retry if that next valid block has been replaced by a better block in the chain.

2022-08-21T19:44:15.875834Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70015, blocks=750474, peer=58, peeraddr=[2604:a880:cad:d0::d9e:f001]:8333 (block-relay-only)
2022-08-21T19:44:45.012022Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 330879
2022-08-21T19:45:16.005414Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 331746
2022-08-21T19:45:47.003036Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 332578
2022-08-21T19:46:18.039548Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 333378
2022-08-21T19:46:49.007385Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 334219
2022-08-21T19:47:20.013586Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 334916
2022-08-21T19:47:51.022263Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 335794
2022-08-21T19:48:22.007479Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 336769
2022-08-21T19:48:53.009529Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 337743
2022-08-21T19:49:24.039053Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 338611
2022-08-21T19:49:55.041510Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 339322
2022-08-21T19:50:26.010515Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 340016
2022-08-21T19:50:57.016185Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 340630
2022-08-21T19:51:28.055804Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 341499
2022-08-21T19:52:08.460256Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=0000000000000000000758f0830f8c7cf41c06740b2004b4186e439b0b989062 height=750475 version=0x2281c000 log2_work=93.689008 tx=758464817 date='2022-08-21T19:50:59Z' progress=1.000000 cache=3.7MiB(27192txo)
2022-08-21T19:52:10.247487Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 341803
2022-08-21T19:52:30.568835Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=00000000000000000005d0b2bbb26ce7bb0eef932eed58853ebe6525ca71c85a height=750476 version=0x20000004 log2_work=93.689019 tx=758465224 date='2022-08-21T19:52:12Z' progress=1.000000 cache=3.9MiB(28964txo)
2022-08-21T19:52:41.785630Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 341814
2022-08-21T19:53:12.001085Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 342683
2022-08-21T19:53:43.027307Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 343432
2022-08-21T19:54:14.012871Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344012
2022-08-21T19:54:45.022063Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344718
2022-08-21T19:54:59.410934Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=000000000000000000098f79590f23c8bb78c3d4d22d1cd4bab3a0e94fc7e6be height=750477 version=0x2ab96000 log2_work=93.689030 tx=758465722 date='2022-08-21T19:54:36Z' progress=1.000000 cache=4.1MiB(30966txo)
2022-08-21T19:55:16.033929Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344913
2022-08-21T19:56:00.985269Z [mapport.cpp:204] [ProcessUpnp] UPnP Port Mapping successful.
2022-08-21T20:16:01.005268Z [mapport.cpp:204] [ProcessUpnp] UPnP Port Mapping successful.

Finally later it may eventually detect that and detect the inconsistency, by eviting some peers.

2022-08-21T20:26:10.234268Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344915
2022-08-21T20:26:10.236286Z [net_processing.cpp:4381] [CheckForStaleTipAndEvictPeers] Potential stale tip detected, will try using extra outbound peer (last tip update: 1871 seconds ago)
2022-08-21T20:26:10.877268Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=194, peeraddr=78.108.108.38:8333 (outbound-full-relay)
2022-08-21T20:26:18.436270Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=195, peeraddr=208.104.92.74:8333 (outbound-full-relay)
2022-08-21T20:26:20.609320Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=196, peeraddr=[2a01:5200:6c:6162:6263:6173:6861:6263]:8333 (outbound-full-relay)
2022-08-21T20:26:22.546436Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=198, peeraddr=141.95.45.187:30034 (outbound-full-relay)
2022-08-21T20:27:18.146310Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=00000000000000000007cb8a63d4c0f24bf0e1c3e6bedbf0436ebab43a3ee690 height=750478 version=0x20800000 log2_work=93.689041 tx=758467003 date='2022-08-21T20:05:24Z' progress=0.999995 cache=5.5MiB(41861txo)
2022-08-21T20:27:20.065326Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344926
2022-08-21T20:27:51.486742Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=00000000000000000006e1bb77ce90948b3050df8ee2143d1913fd583ce3c765 height=750479 version=0x20c00000 log2_work=93.689052 tx=758468307 date='2022-08-21T20:08:38Z' progress=0.999996 cache=6.6MiB(48588txo)
2022-08-21T20:28:03.488188Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=0000000000000000000171f39891841e7128e85f45f386ddde82f65c12dd5773 height=750480 version=0x20400000 log2_work=93.689063 tx=758468706 date='2022-08-21T20:10:40Z' progress=0.999996 cache=6.8MiB(50180txo)
2022-08-21T20:28:05.362170Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344927
2022-08-21T20:28:05.364170Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70015, blocks=750480, peer=205, peeraddr=116.203.112.73:8333 (outbound-full-relay)
2022-08-21T20:28:05.718559Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=208, peeraddr=[2404:7a85:4161:2b00:49a1:427a:fac:3409]:8333 (outbound-full-relay)
2022-08-21T20:28:08.302936Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=211, peeraddr=185.167.113.59:8333 (outbound-full-relay)
2022-08-21T20:28:21.039390Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=213, peeraddr=[240b:252:9480:9d00:e65f:1ff:fe10:d767]:8333 (outbound-full-relay)

And here again we see the deadlock occuring for long time in the coindbindexer.

It seems then that there are malicious nodes on the P2P network sending bad blocks that "partially pass" the basic security, but can cause BitCoin Core to be blocked and become unresponsive. It can no longer respond even to a basic RPC such as "bitcoin -getinfo", or "bitcoin -netinfo"

It will not even repond to a "bitcoin-cli stop": some threads are stopped, but not all, and the indexer thread is still running in its tight loop; if you are in that situation, the only way to recover is to KILL the process entirely, and you'll get the related index to become invalid as it was not committed, it will have to be reconstructed from zero.

Most often, you have no way to recover (given that the RPC service is not responsive, and even the CTRL+C breaker, or a simple kill -hangup" will not work. You have to do an instant kill (kill -9 on Linux, or manual kill in the Process Viewer on Windows: nothing else will be synced to disk).

If the index was the IBD index, the chainstate can only be recovered by check level 4 (works often, but not necessarily always, but never with check level 3). If it does not work, then you have to rebuild the chainstate entirely.

But for any other index (such as the "txindex" or "coinstats" index, there's NO check at all with recovery at startup, so the only way to recover is to entirely delete the content of that index directory and rebuild it from zero (and this will take many hours, and may fail again in the middle, due to bad blocks sent by malicious peers and that may have still not been replaced by valid blocks in the main chain).

Apparently these bad blocks seem to come from unapproved forks (possibly implementing SegWit with too large block sizes?), which are not correctly detected and cause the indexer to fail or run into some infinite loops, or some bad signatures using unsupported/snon-standard bytecodes, taking extremely long time to validate or just discard rapidly as invalid.

Such events with bad blocks coming from malicious peers tend to be frequent now. Unfortunately it's not easy to track which nodes caused that bug as they may not even be connected since long: they have sent their garbage and have disconnected themselves as soon as done, just to crash specific versions of Bitcoin Core not detecting them.

(Note: this is once again occuring with the current unmodified release v23.0 from your official site; it affects both Linux and Windows versions).

fanquake commented 2 years ago

Why are you opening bug reports for bitcoin core on this repository.

verdy-p commented 2 years ago

Why are you opening bug reports for bitcoin core on this repository.

Because this is THIS repository for Bitcoin Core and there's no other support to signal such bug.

But I reproduce this bug again as well with the development version compiled from this Git repository !!!

Why don't you admit this IS a real bug that is occuring VERY often and is VERY EASY to reproduce by just using it normally (without any tweak)? May be it does not affect everyone, but specific nodes are targetted to stall/crash them.

fanquake commented 2 years ago

This is the repository for the website. Report Bitcoin Core bugs to https://github.com/bitcoin/bitcoin.