bigchaindb / bigchaindb

Meet BigchainDB. The blockchain database.
https://www.bigchaindb.com/
Apache License 2.0
4.03k stars 770 forks source link

AppHash does not match after upgrading to new release of BigchainDB #2472

Open charlespetchsy opened 5 years ago

charlespetchsy commented 5 years ago

Bug Report

I am running 4 instances of BigchainDB with MongoDB 3.6 and Tendermint 0.22.8. It is a stock version of bigchaindb which has been installed using pip3

When upgrading BigchainDB after every release, the database has to be dropped every single time a new version is being installed or else it produces a hash error where Tendermint would not started.

The following log is from bigchaindb.log:

[2018-08-06 21:02:08] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:11] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:14] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:17] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:20] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:23] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:26] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:29] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:32] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:35] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:38] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)
[2018-08-06 21:02:41] [WARNING] (bigchaindb.event_stream) WebSocket connection failed with exception Cannot connect to host localhost:26657 ssl:None [Connect call failed ('127.0.0.1', 26657)] (bigchaindb_ws_to_tendermint - pid: 10041)

And the hash error is from Tendermint:

ABCI Replay Blocks                           module=consensus appHeight=98 storeHeight=0 stateHeight=0
panic: Tendermint state.AppHash does not match AppHash after replay. Got 33656534316538633932343064633838633532306536653864663438323137306335356166373461306638353631653164343030313637363139633530656462, expected

At this point, Tendermint is unable to run so I’d have to clear Tendermint and drop the database via bigchaindb -y drop and reload the assets.

My upgrade procedure for each node in the cluster

1) sudo -H pip3 uninstall bigchaindb==2.0.0b4 2) sudo -H pip3 install bigchaindb==2.0.0b5

kansi commented 5 years ago
ABCI Replay Blocks                           module=consensus appHeight=98 storeHeight=0 stateHeight=0

Looking at the above log, I would like to know how are you running the system (native or docker)? It seems that Tendermint's logs are lost somehow as the storeHeight=0 stateHeight=0 where as appHeight=98 which is the number of blocks committed.

charlespetchsy commented 5 years ago

@kansi I had to clear the system and create a new setup. The output above is just a reproduction of the error on a fresh machine. I'm also running the system natively without docker.

ldmberman commented 5 years ago

@charlespetchsy how do you upgrade to a new BigchainDB version? Do you reset Tendermint (e. g. via tendermint_unsafe_reset_all)?

charlespetchsy commented 5 years ago

@ldmberman I upgrade BigchainDB using pip3 and yes I reset Tendermint with tendermint_unsafe_reset_all. I only reset Tendermint when it no longer wants to connect with the BigchainDB cluster. The current process of upgrading now consists of resetting everything and re-creating each transaction.

ldmberman commented 5 years ago

@charlespetchsy right now we are not supporting the kind of replay when Tendermint is behind. So if you reset Tendermint, the node becomes non-operational.

I am investigating if we can introduce support for such replay right now, but in any case there is a question of why Tendermint did not connect to BigchainDB after the upgrade. Could you provide Tendermint and BigchainDB logs from the time they failed to connect?

ldmberman commented 5 years ago

@charlespetchsy it's actually impossible to replay the blocks if Tendermint falls behind, it is not supposed to happen.