bitcoinknots / bitcoin

Bitcoin Knots enhanced Bitcoin node/wallet software
MIT License
200 stars 60 forks source link

Mempool file over 32 MB cannot be deserialized #30

Closed lontivero closed 2 years ago

lontivero commented 3 years ago

Expected behavior

Mempool should be deserialized without error.

Actual behavior

2021-02-24T15:18:39Z Failed to deserialize mempool data on disk: ReadCompactSize(): size too large: iostream error. Continuing anyway.

To reproduce

$ bitcoin-cli stop
... (wait)
$ pgrep bitcoind
$ bitcoind

It seems to happen always with no exception (100% reproducible in our environment)

Note

We didn't experienced this with Bitcoin Core.

System information

Bitcoin Knots version v0.21.0.0-gd4a64f61c1346b15ccb91eafbae16ffc8f52e577 (release build)

But this same problem was present in the previous released version v0.20.0

Ubuntu 18.04.4 LTS (no GUI)

Logs ``` 2021-02-24T15:15:27Z ThreadRPCServer method=stop user=bitcoinuser 2021-02-24T15:15:27Z Interrupting HTTP RPC server 2021-02-24T15:15:27Z Interrupting RPC 2021-02-24T15:15:27Z tor: Thread interrupt 2021-02-24T15:15:27Z Shutdown: In progress... 2021-02-24T15:15:27Z Stopping HTTP RPC server 2021-02-24T15:15:27Z addcon thread exit 2021-02-24T15:15:27Z opencon thread exit 2021-02-24T15:15:27Z RemoveLocal(7v4ytzubiaguuwvw.onion:8333) 2021-02-24T15:15:27Z Stopping RPC 2021-02-24T15:15:27Z torcontrol thread exit 2021-02-24T15:15:27Z RPC stopped. 2021-02-24T15:15:27Z net thread exit 2021-02-24T15:15:27Z msghand thread exit 2021-02-24T15:15:28Z scheduler thread exit 2021-02-24T15:15:30Z Dumped mempool: 0.19573s to copy, 2.15715s to dump 2021-02-24T15:15:30Z FlushStateToDisk: write coins cache to disk (959903 coins, 138092kB) started 2021-02-24T15:15:32Z FlushStateToDisk: write coins cache to disk (959903 coins, 138092kB) completed (2.33s) 2021-02-24T15:15:32Z FlushStateToDisk: write coins cache to disk (0 coins, 14660kB) started 2021-02-24T15:15:32Z FlushStateToDisk: write coins cache to disk (0 coins, 14660kB) completed (0.03s) 2021-02-24T15:15:33Z [default wallet] Releasing wallet 2021-02-24T15:15:33Z Shutdown: done 2021-02-24T15:18:32Z Bitcoin Knots version v0.21.0.0-gd4a64f61c1346b15ccb91eafbae16ffc8f52e577 (release build) 2021-02-24T15:18:32Z InitParameterInteraction: parameter interaction: -whitebind set -> setting -listen=1 2021-02-24T15:18:32Z Assuming ancestors of block 00000000000000000007888a9d01313d69d6335df46ea33e875ee6832670c596 have valid signatures. 2021-02-24T15:18:32Z Setting nMinimumChainWork=0000000000000000000000000000000000000000189ccba50031c45c16a96544 2021-02-24T15:18:32Z ParameterInteraction: parameter interaction: -walletimplicitsegwit=0 -> setting -addresstype=legacy 2021-02-24T15:18:32Z Using the 'sse4(1way),sse41(4way),avx2(8way)' SHA256 implementation 2021-02-24T15:18:32Z Using RdSeed as additional entropy source 2021-02-24T15:18:32Z Using RdRand as an additional entropy source 2021-02-24T15:18:32Z Default data directory /home/user/.bitcoin 2021-02-24T15:18:32Z Using data directory /home/user/.bitcoin 2021-02-24T15:18:32Z Config file: /home/user/.bitcoin/bitcoin.conf 2021-02-24T15:18:32Z R/W Config file: /home/user/.bitcoin/bitcoin_rw.conf (not found, skipping) 2021-02-24T15:18:32Z Config file arg: daemon="1" 2021-02-24T15:18:32Z Config file arg: debug="rpc" 2021-02-24T15:18:32Z Config file arg: mempoolreplacement="fee,optin" 2021-02-24T15:18:32Z Config file arg: rpcpassword=**** 2021-02-24T15:18:32Z Config file arg: rpcthreads="8" 2021-02-24T15:18:32Z Config file arg: rpcuser=**** 2021-02-24T15:18:32Z Config file arg: rpcworkqueue="256" 2021-02-24T15:18:32Z Config file arg: server="1" 2021-02-24T15:18:32Z Config file arg: testnet="0" 2021-02-24T15:18:32Z Config file arg: txindex="1" 2021-02-24T15:18:32Z Config file arg: whitebind="127.0.0.1:8333" 2021-02-24T15:18:32Z Using at most 125 automatic connections (16384 file descriptors available) 2021-02-24T15:18:32Z Using 16 MiB out of 32/2 requested for signature cache, able to store 524288 elements 2021-02-24T15:18:32Z Using 16 MiB out of 32/2 requested for script execution cache, able to store 524288 elements 2021-02-24T15:18:32Z Script verification uses 15 additional threads 2021-02-24T15:18:32Z scheduler thread start 2021-02-24T15:18:32Z HTTP: creating work queue of depth 256 2021-02-24T15:18:32Z Starting RPC 2021-02-24T15:18:32Z Starting HTTP RPC server 2021-02-24T15:18:32Z Config options rpcuser and rpcpassword will soon be deprecated. Locally-run instances may remove rpcuser to use cookie-based auth, or may be replaced with rpcauth. Please see share/rpcauth for rpcauth auth generation. 2021-02-24T15:18:32Z HTTP: starting 8 worker threads 2021-02-24T15:18:32Z Using wallet directory /home/user/.bitcoin 2021-02-24T15:18:32Z init message: Verifying wallet(s)... 2021-02-24T15:18:32Z Using BerkeleyDB version Berkeley DB 4.8.30: (April 9, 2010) 2021-02-24T15:18:32Z Using wallet /home/user/.bitcoin/wallet.dat 2021-02-24T15:18:32Z BerkeleyEnvironment::Open: LogDir=/home/user/.bitcoin/database ErrorFile=/home/user/.bitcoin/db.log 2021-02-24T15:18:33Z init message: Loading banlist... 2021-02-24T15:18:33Z SetNetworkActive: true 2021-02-24T15:18:33Z Using /16 prefix for IP bucketing 2021-02-24T15:18:33Z Cache configuration: 2021-02-24T15:18:33Z * Using 2.0 MiB for block index database 2021-02-24T15:18:33Z * Using 56.0 MiB for transaction index database 2021-02-24T15:18:33Z * Using 8.0 MiB for chain state database 2021-02-24T15:18:33Z * Using 384.0 MiB for in-memory UTXO set (plus up to 286.1 MiB of unused mempool space) 2021-02-24T15:18:33Z * Flushing caches if available system memory drops below 10 MiB 2021-02-24T15:18:33Z init message: Loading block index... 2021-02-24T15:18:33Z Switching active chainstate to Chainstate [ibd] @ height -1 (null) 2021-02-24T15:18:33Z Opening LevelDB in /home/user/.bitcoin/blocks/index 2021-02-24T15:18:33Z Opened LevelDB successfully 2021-02-24T15:18:33Z Using obfuscation key for /home/user/.bitcoin/blocks/index: 0000000000000000 2021-02-24T15:18:37Z LoadBlockIndexDB: last block file = 2465 2021-02-24T15:18:37Z LoadBlockIndexDB: last block file info: CBlockFileInfo(blocks=99, size=129829546, heights=671887...671985, time=2021-02-24...2021-02-24) 2021-02-24T15:18:37Z Checking all blk files are present... 2021-02-24T15:18:37Z Opening LevelDB in /home/user/.bitcoin/chainstate 2021-02-24T15:18:38Z Opened LevelDB successfully 2021-02-24T15:18:38Z Using obfuscation key for /home/user/.bitcoin/chainstate: c405746c298350d2 2021-02-24T15:18:38Z Loaded best chain: hashBestChain=00000000000000000004ca94ef1c162008391a09bc438ce4c6c702a040f28a84 height=671985 date=2021-02-24T15:07:50Z progress=0.999996 2021-02-24T15:18:38Z init message: Rewinding blocks... 2021-02-24T15:18:38Z FlushStateToDisk: write coins cache to disk (0 coins, 0kB) started 2021-02-24T15:18:38Z FlushStateToDisk: write coins cache to disk (0 coins, 0kB) completed (0.00s) 2021-02-24T15:18:38Z init message: Verifying blocks... 2021-02-24T15:18:38Z Verifying last 6 blocks at level 3 2021-02-24T15:18:38Z [0%]...[16%]...[33%]...[50%]...[66%]...[83%]...[99%]...[DONE]. 2021-02-24T15:18:39Z No coin database inconsistencies in last 6 blocks (14986 transactions) 2021-02-24T15:18:39Z block index 6548ms 2021-02-24T15:18:39Z Opening LevelDB in /home/user/.bitcoin/indexes/txindex 2021-02-24T15:18:39Z Opened LevelDB successfully 2021-02-24T15:18:39Z Using obfuscation key for /home/user/.bitcoin/indexes/txindex: 0000000000000000 2021-02-24T15:18:39Z txindex thread start 2021-02-24T15:18:39Z txindex is enabled at height 671985 2021-02-24T15:18:39Z txindex thread exit 2021-02-24T15:18:39Z init message: Loading wallet... 2021-02-24T15:18:39Z BerkeleyEnvironment::Open: LogDir=/home/user/.bitcoin/database ErrorFile=/home/user/.bitcoin/db.log 2021-02-24T15:18:39Z [default wallet] Wallet File Version = 159900 2021-02-24T15:18:39Z [default wallet] Keys: 2001 plaintext, 0 encrypted, 2001 w/ metadata, 2001 total. Unknown wallet records: 0 2021-02-24T15:18:39Z [default wallet] Wallet completed loading in 40ms 2021-02-24T15:18:39Z [default wallet] setKeyPool.size() = 2000 2021-02-24T15:18:39Z [default wallet] mapWallet.size() = 0 2021-02-24T15:18:39Z [default wallet] m_address_book.size() = 0 2021-02-24T15:18:39Z block tree size = 672043 2021-02-24T15:18:39Z nBestHeight = 671985 2021-02-24T15:18:39Z loadblk thread start 2021-02-24T15:18:39Z AddLocal(178.62.244.102:8333,1) 2021-02-24T15:18:39Z Discover: IPv4 eth0: 178.62.244.102 2021-02-24T15:18:39Z Failed to deserialize mempool data on disk: ReadCompactSize(): size too large: iostream error. Continuing anyway. ```
nopara73 commented 3 years ago

Reproduced it on Windows.

nopara73 commented 3 years ago

Bitcoin Knots version v0.20.1 was also doing this, but we thought it's just a fluke.

molnard commented 3 years ago

How to reproduce

Override mempool.dat here C:\Users\user\AppData\Roaming\Bitcoin https://drive.google.com/file/d/13PFtUE-ITsSBLq0rrb5uF6jQKqka3Xb7/view?usp=sharing

See an error in debug.log

2021-02-24T16:25:25Z Failed to deserialize mempool data on disk: ReadCompactSize(): size too large: iostream error. Continuing anyway.

After this mempool lost and the node starting build up the mempool from scratch.

Our backend is using Ubuntu, I reproduced it on Windows, both had the bug so it is platform-independent.

lontivero commented 3 years ago

FYI: test/functional/mempool_persist.py pass

2021-02-24T18:20:13.531856Z [httpworker.2] [default wallet] Submitting wtx 60179ab15ab27be9c3a42ca6dcaf53cf62466acce89ddf3980edf7cd831ed48c to mempool for relay
2021-02-24T18:20:13.579655Z [httpworker.0] [default wallet] Submitting wtx fc45c3df4ee7c5bb3c1a8b9be11bf9f7a5313c0fbe777745d1954a6ff2d82b0e to mempool for relay
2021-02-24T18:20:13.629336Z [httpworker.2] [default wallet] Submitting wtx 5dee1effb2e7b0b967a85629a225428df2dd897312f740a01daac8e8b02ca192 to mempool for relay
2021-02-24T18:20:13.679815Z [httpworker.0] [default wallet] Submitting wtx 2807b5bb86cc67de3038f2b69fa0d5bda3da7a1a3b3e870226d586a001c91cb3 to mempool for relay
2021-02-24T18:20:13.686547Z [httpworker.1] ThreadRPCServer method=getrawmempool user=__cookie__
2021-02-24T18:20:13.689265Z [httpworker.3] ThreadRPCServer method=getrawmempool user=__cookie__
2021-02-24T18:23:40.625955Z [shutoff] Dumped mempool: 7e-06s to copy, 0.001928s to dump
2021-02-24T18:23:40.625990Z [shutoff] Recorded 5 unconfirmed txs from mempool in 6e-06s
2021-02-24T18:24:08.402190Z [init] * Using 440.0 MiB for in-memory UTXO set (plus up to 286.1 MiB of unused mempool space)
2021-02-24T18:24:08.453175Z [init] [default wallet] Submitting wtx 934ef4ffe1aded1d28f57cc8c0130e91c0dcf0362df973087aa016746830aa8d to mempool for relay
2021-02-24T18:24:08.453507Z [init] [default wallet] Submitting wtx 60179ab15ab27be9c3a42ca6dcaf53cf62466acce89ddf3980edf7cd831ed48c to mempool for relay
2021-02-24T18:24:08.454012Z [init] [default wallet] Submitting wtx fc45c3df4ee7c5bb3c1a8b9be11bf9f7a5313c0fbe777745d1954a6ff2d82b0e to mempool for relay
2021-02-24T18:24:08.454530Z [loadblk] Imported mempool transactions from disk: 5 succeeded, 0 failed, 0 expired, 0 already there, 5 waiting for initial broadcast
2021-02-24T18:24:08.454617Z [init] [default wallet] Submitting wtx 5dee1effb2e7b0b967a85629a225428df2dd897312f740a01daac8e8b02ca192 to mempool for relay
2021-02-24T18:24:08.454710Z [init] [default wallet] Submitting wtx 2807b5bb86cc67de3038f2b69fa0d5bda3da7a1a3b3e870226d586a001c91cb3 to mempool for relay
2021-02-24T18:24:08.466766Z [httpworker.1] ThreadRPCServer method=getmempoolinfo user=__cookie__

Using the @molnard mempool.dat file it seems to fail while deserializing the "txs" entry.

lontivero commented 3 years ago

It seems the mempool.dat file ~is broken but I don't see how is that possible.~ contains transactions that sum a size bigger than the allowed.

 ~$ cat ~/.bitcoin/regtest/mempool.dat  | xxd | head -100
00000000: 0200 0000 0000 0000 0306 6465 6c74 6173  ..........deltas
00000010: 0100 066d 696e 6665 6511 7680 d2c5 c53b  ...minfee.v....;
00000020: bf40 0188 6d36 6000 0000 0003 7478 73fe  .@..m6`.....txs.
00000030: 79ee 9506 fece 3801 0002 00fd e501 0200  y.....8.........
molnard commented 3 years ago

Just mentioning, I saved the file on the fly - I did not stop bitcoind service, I was afraid of losing the mempool again.

nopara73 commented 3 years ago

Here's a mempool file created by Knots: if you load it with Bitcoin Core 21, then it's successful, if you do it with Bitcoin Knots 21 then it fails.

https://filebin.net/bp477ybe98z95bxy

nopara73 commented 3 years ago

FTR @luke-jr the reason why we're investigating it so fiercely because we cannot restart Knots on the server without the mempool corrupting.

MaxHillebrand commented 3 years ago

we cannot restart Knots on the server without the mempool corrupting.

And this takes away our most powerful skill of trouble shooting - trying to turn it off and on again... :cry:

lontivero commented 3 years ago

It seems it cannot load txs verctors larger than MAX_SIZE (33_554432) in the our case that vector is 110 489_ 209 long.

luke-jr commented 3 years ago

A workaround would probably be -maxmempool=32 I suppose.

I had planned to drop the custom mempool.dat format in 22.0 already (0.21 added compatibility code for Core's format), but sounds like it might be good to do it sooner.

molnard commented 3 years ago

A workaround would probably be -maxmempool=32 I suppose.

I had planned to drop the custom mempool.dat format in 22.0 already (0.21 added compatibility code for Core's format), but sounds like it might be good to do it sooner.

It is a priority issue for us. 32Mb mempool is a no-go, on the contrary, it should be increased more to avoid wallet state inconsistencies. All the clients (without the Full node), maintain their mempool according to the backend's mempool. Every time I need to restart the backend - for any reason mempool lost. It is like sitting on a bomb.

Is it possible to fix this ASAP in a hotfix release, under any conditions?

luke-jr commented 3 years ago

@molnard If you can apply a patch and build yourself, this one should write a Core mempool.dat instead (which is supported by Knots 0.21) https://dpaste.com/ELMEQ8ZUG

Does this hotfix work for you, or you need more?

lontivero commented 3 years ago

@luke-jr I am a bit blocked with this hotfix. I applied the patch and compiled it but I cannot make the test/functional/mempool_compatibility.py pass because it fails with:

FileNotFoundError: [Errno 2] No such file or directory: '/mnt/data/GitHub/bitcoin-knots/releases/v0.19.1/bin/bitcoind'
/tmp/bitcoin_func_test_seni8rds ``` 2021-02-27T18:24:07.926000Z TestFramework (DEBUG): PRNG seed is: 6820928144085222329 2021-02-27T18:24:07.926000Z TestFramework (DEBUG): Setting up network thread 2021-02-27T18:24:07.926000Z TestFramework (INFO): Initializing test directory /tmp/bitcoin_func_test_seni8rds 2021-02-27T18:24:07.926000Z TestFramework (DEBUG): Copy cache directory /mnt/data/GitHub/bitcoin-knots/test/cache/node0 to node 0 2021-02-27T18:24:07.940000Z TestFramework (DEBUG): Copy cache directory /mnt/data/GitHub/bitcoin-knots/test/cache/node0 to node 1 2021-02-27T18:24:07.955000Z TestFramework.node0 (DEBUG): bitcoind started, waiting for RPC to come up 2021-02-27T18:24:07.958000Z TestFramework.node1 (DEBUG): bitcoind started, waiting for RPC to come up 2021-02-27T18:24:08.210000Z TestFramework.node0 (DEBUG): Stopping node 2021-02-27T18:24:08.210000Z TestFramework (ERROR): Assertion failed Traceback (most recent call last): File "/mnt/data/GitHub/bitcoin-knots/test/functional/test_framework/test_framework.py", line 507, in start_nodes node.wait_for_rpc_connection() File "/mnt/data/GitHub/bitcoin-knots/test/functional/test_framework/test_node.py", line 227, in wait_for_rpc_connection raise FailedToStartError(self._node_msg( test_framework.test_node.FailedToStartError: [node 0] bitcoind exited with status 1 during initialization During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/mnt/data/GitHub/bitcoin-knots/test/functional/test_framework/test_framework.py", line 125, in main self.setup() File "/mnt/data/GitHub/bitcoin-knots/test/functional/test_framework/test_framework.py", line 262, in setup self.setup_network() File "./test/functional/mempool_compatibility.py", line 35, in setup_network self.start_nodes() File "/mnt/data/GitHub/bitcoin-knots/test/functional/test_framework/test_framework.py", line 510, in start_nodes self.stop_nodes() File "/mnt/data/GitHub/bitcoin-knots/test/functional/test_framework/test_framework.py", line 526, in stop_nodes node.stop_node(wait=wait) File "/mnt/data/GitHub/bitcoin-knots/test/functional/test_framework/test_node.py", line 326, in stop_node self.stop(wait=wait) File "/mnt/data/GitHub/bitcoin-knots/test/functional/test_framework/test_node.py", line 186, in __getattr__ assert self.rpc_connected and self.rpc is not None, self._node_msg("Error: no RPC connection") AssertionError: [node 0] Error: no RPC connection 2021-02-27T18:24:08.213000Z TestFramework (DEBUG): Closing down network thread 2021-02-27T18:24:08.265000Z TestFramework (INFO): Stopping nodes 2021-02-27T18:24:08.265000Z TestFramework.node0 (DEBUG): Stopping node ```

This is because knots-v0.19.1 says: This software is expired

/m/d/G/bitcoin-knots (0.21.x-knots) [1]> ./releases/v0.19.1/bin/bitcoind --version
Bitcoin Knots version v0.19.1.knots20200304
Copyright (C) 2009-2019 The Bitcoin Core developers

Please contribute if you find Bitcoin Knots useful. Visit
<http://bitcoinknots.org/> for further information about the software.
The source code is available from <https://github.com/bitcoinknots/bitcoin>.

This is experimental software.
Distributed under the MIT software license, see the accompanying file COPYING
or <https://opensource.org/licenses/MIT>

This product includes software developed by the OpenSSL Project for use in the
OpenSSL Toolkit <https://www.openssl.org> and cryptographic software written by
Eric Young and UPnP software written by Thomas Bernard.

/m/d/G/bitcoin-knots (0.21.x-knots)> ./releases/v0.19.1/bin/bitcoind
This software is expired, and may be out of consensus. You must choose to upgrade or override this expiration.

What am i doing wrong? How can I run the test/functional/mempool_compatibility.py?

Thanks for the hotfix.

luke-jr commented 3 years ago

It is expecting Core releases for the compatibility tests. (I do have a Knots compatibility variant locally, but it's a hack and I didn't adapt it to this hotfix)

lontivero commented 3 years ago

Tested. It is compatible with Core mempool.dat files in the two ways.

molnard commented 3 years ago

Here is the diff: https://github.com/lontivero/bitcoin/commit/14a6a1dd20fc4ded1aebd1faa5597e2862fef714

And the release https://github.com/lontivero/bitcoin/releases/tag/knots-0.21.x-mempool-hotfix

Awesome @lontivero @luke-jr 🙏

molnard commented 2 years ago

With this mempool file, you will end up the same as here.

2021-06-28T17:00:43Z Failed to deserialize mempool data on disk: ReadCompactSize(): size too large: iostream error. Continuing anyway.

https://github.com/molnard/bitcoin/releases/download/v0.21.0.0-14a6a1dd2/mempool.dat

luke-jr commented 2 years ago

This should be fixed in 0.21.1.knots20210629. Please close the issue once you've confirmed.

(Additional gitian signatures would be appreciated BTW)

lontivero commented 2 years ago

Fixed and deployed.