IntersectMBO / cardano-node

The core component that is used to participate in a Cardano decentralised blockchain.
https://cardano.org
Apache License 2.0
3.06k stars 720 forks source link

[BUG] - Inconsistency in Ledger (Negative Coin value) #1312

Closed bergr01 closed 4 years ago

bergr01 commented 4 years ago

cardano-node 1.13.0 - darwin-x86_64 - ghc-8.6 git rev a1dd8ebb7b0a959619abceb34ef1044b9ea696b0

Private network (guild-operators, using settings referenced here - https://github.com/cardano-community/guild-operators/tree/master/files/ptn0/files)

Correctly stopped passive node that was exited from LiveView via 'Q' button press earlier in the day subsequently took about 50 minutes after launch to show non-zero epoch/slot/block number information.

Majority of this time was supposedly spent incrementally replaying blocks, e.g. starting with

{"at":"2020-06-20T06:03:18.43Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"kind":"TraceLedgerReplayEvent.ReplayedBlock","slot":20,"tip":265848},"app":[],"msg":"","pid":"94674","loc":null,"host":"MYHOST","sev":"Info","thread":"32"}

right until

{"at":"2020-06-20T06:52:10.47Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"kind":"TraceLedgerReplayEvent.ReplayedBlock","slot":265848,"tip":265848},"app":[],"msg":"","pid":"94674","loc":null,"host":"MYHOST","sev":"Info","thread":"32"}

after which the ledger DB opening commenced (which itself took about 4 minutes):

{"at":"2020-06-20T06:52:10.47Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"kind":"TraceOpenEvent.OpenedLgrDB"},"app":[],"msg":"","pid":"94674","loc":null,"host":"YARRISMA","sev":"Info","thread":"32"}

Can you please shed some light on how to prevent this from happening and have node start in timely fashion?

Worth noting that the log file started with the following:

{"at":"2020-06-20T06:03:18.43Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"kind":"TraceOpenEvent.OpenedVolDB"},"app":[],"msg":"","pid":"94674","loc":null,"host":"MYHOST","sev":"Info","thread":"32"} {"at":"2020-06-20T06:03:18.43Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"snapshot":{"snapshot":"DiskSnapshot 70","kind":"snapshot"},"kind":"TraceLedgerEvent.InvalidSnapshot","failure":"InitFailureRead (ReadFailed (DeserialiseFailure 261 \"An error occured while decoding Negative Coin.\nError: -2800202946253107677\"))"},"app":[],"msg":"","pid":"94674","loc":null,"host":"MYHOST","sev":"Error","thread":"32"} {"at":"2020-06-20T06:03:18.43Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"snapshot":{"snapshot":"DiskSnapshot 69","kind":"snapshot"},"kind":"TraceLedgerEvent.InvalidSnapshot","failure":"InitFailureRead (ReadFailed (DeserialiseFailure 262 \"An error occured while decoding Negative Coin.\nError: -2797174925787671926\"))"},"app":[],"msg":"","pid":"94674","loc":null,"host":"MYHOST","sev":"Error","thread":"32"} {"at":"2020-06-20T06:03:18.43Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"kind":"TraceLedgerReplayEvent.ReplayFromGenesis"},"app":[],"msg":"","pid":"94674","loc":null,"host":"MYHOST","sev":"Info","thread":"32"}

nilaysaha commented 4 years ago

Solved. Had to update the config files. It is in all probability due to the wrong version of ff-genesis.json file. Works now ...

bergr01 commented 4 years ago

Solved. Had to update the config files. It is in all probability due to the wrong version of ff-genesis.json file. Works now ...

Please don't muddy up the waters, there's nothing solved about this issue that I reported...

nilaysaha commented 4 years ago

Solved. Had to update the config files. It is in all probability due to the wrong version of ff-genesis.json file. Works now ...

Please don't muddy up the waters, there's nothing solved about this issue that I reported...

Sorry...my fault. Mixed it up with my issues...

mrBliss commented 4 years ago

{"at":"2020-06-20T06:03:18.43Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"snapshot":{"snapshot":"DiskSnapshot 70","kind":"snapshot"},"kind":"TraceLedgerEvent.InvalidSnapshot","failure":"InitFailureRead (ReadFailed (DeserialiseFailure 261 "An error occured while decoding Negative Coin.\nError: -2800202946253107677"))"},"app":[],"msg":"","pid":"94674","loc":null,"host":"MYHOST","sev":"Error","thread":"32"} {"at":"2020-06-20T06:03:18.43Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"snapshot":{"snapshot":"DiskSnapshot 69","kind":"snapshot"},"kind":"TraceLedgerEvent.InvalidSnapshot","failure":"InitFailureRead (ReadFailed (DeserialiseFailure 262 "An error occured while decoding Negative Coin.\nError: -2797174925787671926"))"},"app":[],"msg":"","pid":"94674","loc":null,"host":"MYHOST","sev":"Error","thread":"32"} {"at":"2020-06-20T06:03:18.43Z","env":"1.13.0:a1dd8","ns":["cardano.node.ChainDB"],"data":{"kind":"TraceLedgerReplayEvent.ReplayFromGenesis"},"app":[],"msg":"","pid":"94674","loc":null,"host":"MYHOST","sev":"Info","thread":"32"}

This indicates the following happened: when you shut down the node the last time, its ledger contained a negative Coin somewhere. This should not happen, and is something for the ledger-specs team should look into, cc: @JaredCorduan (or maybe it has been fixed recently?).

A snapshot of the ledger (the UTxO and other state) is written to disk so that the next time you start the node, it can just read it and quickly continue working (with a little replay of blocks). That negative Coin was serialised to disk without causing errors (no validation should be needed, as it should be impossible to have a negative Coin), but when it is deserialised, it is rejected, as it should not be negative. That's what the trace messages above say: the ledger snapshots on disk (we take multiple) are both invalid because they can't be deserialised because of a negative Coin value. This means we have no choice but to reconstruct the ledger state by replaying all blocks in the database against the genesis UTxO. As this requires reading and validating all blocks in the database, this takes a while.

kevinhammond commented 4 years ago

This is not really a performance issue - the node is correctly validating the chain, as it should when it discovers an inconsistency (here a negative Coin value). We should make sure we understand why we have this inconsistency.

JaredCorduan commented 4 years ago

Thank you for reporting this @bergr01 ! I can't tell from the errors below where a negative coin is appearing, but I have a good guess. There is one place in the ledger state where negative coins are used, namely in reward updates (we record, for instance, that the reserves will be reduced by X lovelace when the update is applied, etc). We had not originally planned on this state ever being serialized.

I think that the correct thing to do is to create a new "coin difference" type to be used for this situation, which has a different serialization and allows for negatives.

JaredCorduan commented 4 years ago

would y'all be willing to try to recreate this issue using the 1.15.0 tag? the logs from cardano-cli shelley query ledger-state will now give us some very helpful new information.

kevinhammond commented 4 years ago

We have been unable to reproduce this issue. It may be related to some specific configuration settings? I am closing it for now.