MinaProtocol / mina

Mina is a cryptocurrency protocol with a constant size blockchain, improving scaling while maintaining decentralization and security.
https://minaprotocol.com
Apache License 2.0
1.99k stars 528 forks source link

Ledger mismatch on mainnet database #15138

Closed dkijania closed 6 months ago

dkijania commented 8 months ago

In the attempt of repair nonces on mainnet, we have found issue on both o1 and MF schemas. Reproduction path:

_repairnonces.sh

apt-get update
apt-get install wget

wget https://raw.githubusercontent.com/MinaProtocol/mina/compatible/genesis_ledgers/mainnet.json -O mainnet.json
cat mainnet.json | jq '.ledger.accounts' > accounts.json
echo '{ "genesis_ledger": { "accounts": '$(cat accounts.json)' } }' | jq > initial_config.json

mina-replayer --archive-uri postgres://postgres:postgres@localhost:5432/o1_archive_balances_migrated --input-file initial_config.json --repair-nonces --checkpoint-interval 10000

docker setup

 docker run --network host  --entrypoint bash  --volume /fixing_replayer/:/workdir --workdir=/workdir docker.io/minaprotocol/mina-rosetta:1.4.1beta1-6e8121c-focal /workdir/repair_nonces.sh

Error

{"timestamp":"2024-02-14 16:17:32.333756Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 1261, characters 14-25"},"message":"Starting processing of commands in block with state_hash $state_hash at global slot since genesis 146032","metadata":{"pid":260,"state_hash":"3NLuwg65Sxyk9yBiVxsKeTYiTny8KPiVpJkrcDSxVHuWPNxB5YCk"}}
{"timestamp":"2024-02-14 16:17:32.333765Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 790, characters 2-13"},"message":"Applying user command (delegation) with nonce 0, global slot since genesis 146032, and sequence number 0","metadata":{"pid":260}}
{"timestamp":"2024-02-14 16:17:32.338076Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 790, characters 2-13"},"message":"Applying user command (payment) with nonce 137525, global slot since genesis 146032, and sequence number 1","metadata":{"pid":260}}
{"timestamp":"2024-02-14 16:17:32.343978Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 580, characters 2-13"},"message":"Applying internal command (coinbase) with global slot since genesis 146032, sequence number 2, and secondary sequence number 0","metadata":{"pid":260}}
{"timestamp":"2024-02-14 16:17:32.346163Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 580, characters 2-13"},"message":"Applying internal command (fee_transfer) with global slot since genesis 146032, sequence number 3, and secondary sequence number 0","metadata":{"pid":260}}
{"timestamp":"2024-02-14 16:17:32.348392Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 1235, characters 16-27"},"message":"Applied all commands at global slot since genesis 146032, got expected ledger hash","metadata":{"ledger_hash":"jwiqcSGZ9MxDPPyGEeWMx8TBZc7bjiKReiJey4qHqdKdwuw6rgg","pid":260}}
{"timestamp":"2024-02-14 16:17:32.348421Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 1261, characters 14-25"},"message":"Starting processing of commands in block with state_hash $state_hash at global slot since genesis 146033","metadata":{"pid":260,"state_hash":"3NKzdd7UjWLygUvgKfJwd3MVj3PD1GnXDHHGoTLyXinVvSoFRyX5"}}
{"timestamp":"2024-02-14 16:17:32.348430Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 790, characters 2-13"},"message":"Applying user command (payment) with nonce 137526, global slot since genesis 146033, and sequence number 0","metadata":{"pid":260}}
{"timestamp":"2024-02-14 16:17:32.354279Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 580, characters 2-13"},"message":"Applying internal command (coinbase) with global slot since genesis 146033, sequence number 1, and secondary sequence number 0","metadata":{"pid":260}}
{"timestamp":"2024-02-14 16:17:32.356364Z","level":"Info","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 580, characters 2-13"},"message":"Applying internal command (fee_transfer) with global slot since genesis 146033, sequence number 2, and secondary sequence number 0","metadata":{"pid":260}}
{"timestamp":"2024-02-14 16:17:32.359190Z","level":"Error","source":{"module":"Dune__exe__Replayer","location":"File \"src/app/replayer/replayer.ml\", line 1242, characters 16-28"},"message":"Applied all commands at global slot since genesis 146033, ledger hash differs from expected ledger hash","metadata":{"expected_ledger_hash":"jwooq2BDZZuBFs9sMmv658FXxXoihHCHnF6r4eFSsPgd69wSLFw","ledger_hash":"jy1sumyqtt4GqdXeg5ECWWFdKbRBgV1uST34xNgKcDUFu5hgdDa","pid":260}}
dkijania commented 8 months ago

We received schema from third party partner. Will try to check if i can reproduce issue their schema

ghost-not-in-the-shell commented 8 months ago

reproduced the same error without --repair-nonces flag.

psteckler commented 8 months ago

When running the replayer, I observed a few ledger hash mismatches from around January 2022. As I recall, there was some code change that was eventually reverted. After those few mismatches, all the ledger hashes should be as expected.

ghost-not-in-the-shell commented 8 months ago

This wrong ledger corresponds to this orphaned block: https://minaexplorer.com/block/3NKm5VGDQXtekWf3erWfjcHciGzjAWB4u7WNE7gAHGYjvkFbF5xj and in our mainnet db dump, the user_commands and internal_commands for those 2 blocks are the same.

psteckler commented 8 months ago

Yes, I recall that the mismatched hashes were associated with orphaned blocks.

If you run the replayer with --continue-on-error, you'll see a few of these.

ghost-not-in-the-shell commented 8 months ago

So this is caused by a minor bug that's already been fixed. When running replayer around 146033 - 146752 slots, please use the --continue-on-error flag and with --checkpoint-interval 1000 In the replayer log, you would expect to the following blocks at slot: 146033, 146078, 146102, 146164, 146328, 146399, 146438, 146489, 146631, 146752 have their ledger hash being "wrong". Once a new checkpoint file is created around 147000 slot, then we can restart the replayer with this checkpoint file without --continue-on-error flag anymore.