Concordium / concordium-node

The main concordium node implementation.
GNU Affero General Public License v3.0
45 stars 22 forks source link

Transaction table invariant violation #246

Closed abizjak closed 2 years ago

abizjak commented 2 years ago

Bug Description

2022-02-11T16:02:43.148892757Z: INFO: Skov: Finalization statistics: finalizationCount=103773 lastFinalizedTime=1.644595363148873e9 finalizationPeriodEMA=Just 1.2405172532577067e-2 finalizationPeriodEMSD=Just 2.8057460559554676e-2
2022-02-11T16:02:43.154538375Z: INFO: Skov: Block d19d249c34a6b6e23847d2ff77d9e0d0fdf170dbb3e4ac0061a2ab17b6386e46 (1) arrived
2022-02-11T16:02:43.154582605Z: INFO: Skov: Arrive statistics: blocksVerifiedCount=119673 blockLastArrive=1.644595363154527e9 blockArriveLatencyEMA=4591511.344877137 blockArriveLatencyEMSD=96.00067331340259 blockArrivePeriodEMA=Just 1.0382203586782802e-2 blockArrivePeriodEMSD=Just 2.5764293574172176e-2 transactionsPerBlockEMA=0.12764863404940072 transactionsPerBlockEMSD=0.3336981574338146
2022-02-11T16:02:43.154985015Z: INFO: Skov: Receive statistics: blocksReceivedCount=119865 blockLastReceived=1.6445953631546094e9 blockReceiveLatencyEMA=4591497.168846339 blockReceiveLatencyEMSD=100.5041964156714 blockReceivePeriodEMA=Just 1.013045412350069e-2 blockReceivePeriodEMSD=Just 2.321211677717518e-2
2022-02-11T16:02:43.157341598Z: INFO: Skov: Block d19d249c34a6b6e23847d2ff77d9e0d0fdf170dbb3e4ac0061a2ab17b6386e46 is finalized at height 119673 with finalization delta=0
2022-02-11T16:02:43.157456969Z: INFO: Skov: Finalization statistics: finalizationCount=103774 lastFinalizedTime=1.6445953631573486e9 finalizationPeriodEMA=Just 1.201220737931936e-2 finalizationPeriodEMSD=Just 2.664373806827682e-2
concordium-node: Assertion failed
CallStack (from HasCallStack):
  assert, called at src/Concordium/GlobalState/TransactionTable.hs:305:21 in concordium-consensus-0.1.0.0-5rhBla5LijTGt2oMhAD7ZX:Concordium.GlobalState.TransactionTable
concordium-node: warning: too many hs_exit()s

Steps to Reproduce

I do not have easy steps yet, but it happens somewhat reliably during catchup, but it is not deterministic (that is, the height at which it happens varies). I believe it has to do with transaction purging since it would otherwise likely be deterministic.

This is likely introduced by the transaction verification PR #217

Expected Result

No crash.

Actual Result

Crash.

Versions

abizjak commented 2 years ago

Two other occurrences

2022-02-11T13:23:39.481622109Z: INFO: Skov: Arrive statistics: blocksVerifiedCount=94851 blockLastArrive=1.6445858194813533e9 blockArriveLatencyEMA=4837497.103971293 blockArriveLatencyEMSD=109.54009262034161 blockArrivePeriodEMA=Just 2.9720792218036647e-2 blockArrivePeriodEMSD=Just 6.718982595110196e-2 transactionsPerBlockEMA=0.12332977849745787 transactionsPerBlockEMSD=0.3288153649591604
2022-02-11T13:23:39.482438560Z: INFO: Skov: Receive statistics: blocksReceivedCount=94985 blockLastReceived=1.6445858194816823e9 blockReceiveLatencyEMA=4837485.8607291235 blockReceiveLatencyEMSD=109.24832340616304 blockReceivePeriodEMA=Just 2.744974308435898e-2 blockReceivePeriodEMSD=Just 6.388732849180923e-2
2022-02-11T13:23:39.487118882Z: INFO: Skov: Block 941eba7c902f512daaa684004e96de17fbef369c9e72e401d68b434b9d792455 is finalized at height 94851 with finalization delta=0
2022-02-11T13:23:39.487159003Z: INFO: Skov: Finalization statistics: finalizationCount=82224 lastFinalizedTime=1.6445858194871275e9 finalizationPeriodEMA=Just 3.513197978179265e-2 finalizationPeriodEMSD=Just 7.136268506706585e-2
concordium-node: Assertion failed
CallStack (from HasCallStack):
  assert, called at src/Concordium/GlobalState/TransactionTable.hs:305:21 in concordium-consensus-0.1.0.0-5rhBla5LijTGt2oMhAD7ZX:Concordium.GlobalState.TransactionTable
concordium-node: warning: too many hs_exit()s
--
2022-02-11T13:53:53.376072072Z: INFO: Skov: Arrive statistics: blocksVerifiedCount=33324 blockLastArrive=1.6445876333760169e9 blockArriveLatencyEMA=5460754.11620067 blockArriveLatencyEMSD=98.28661609919322 blockArrivePeriodEMA=Just 1.1626419570503296e-2 blockArrivePeriodEMSD=Just 2.3386648157922012e-2 transactionsPerBlockEMA=0.14813959944470176 transactionsPerBlockEMSD=0.3552383122934027
2022-02-11T13:53:53.376467523Z: INFO: Skov: Receive statistics: blocksReceivedCount=33406 blockLastReceived=1.644587633376092e9 blockReceiveLatencyEMA=5460742.784659926 blockReceiveLatencyEMSD=99.23815914042703 blockReceivePeriodEMA=Just 1.1307945317480982e-2 blockReceivePeriodEMSD=Just 2.2903230517558916e-2
2022-02-11T13:53:53.378745994Z: INFO: Skov: Block 479d7e0c2902cbb178e6141d33719502e6890b9a319a8bb40d77c269fa56a28b is finalized at height 33324 with finalization delta=0
2022-02-11T13:53:53.378768704Z: INFO: Skov: Finalization statistics: finalizationCount=28938 lastFinalizedTime=1.6445876333787513e9 finalizationPeriodEMA=Just 1.2129446999212555e-2 finalizationPeriodEMSD=Just 2.5405040183061927e-2
concordium-node: Assertion failed
CallStack (from HasCallStack):
  assert, called at src/Concordium/GlobalState/TransactionTable.hs:305:21 in concordium-consensus-0.1.0.0-5rhBla5LijTGt2oMhAD7ZX:Concordium.GlobalState.TransactionTable
concordium-node: warning: too many hs_exit()s
abizjak commented 2 years ago

I believe this is caused by the following sequence of events.

This is caused by this line https://github.com/Concordium/concordium-node/pull/217/files#diff-60ddc76f3fb6bdf7404447bd41cc0f74277f81f1651f80a315379a90038d1958R558 together with the following implementation of addPendingTable.

addPendingTransaction :: TransactionData t => Nonce -> t -> PendingTransactionTable -> PendingTransactionTable
addPendingTransaction nextNonce tx PTT{..} = assert (nextNonce <= nonce) $ let v = HM.alter f sender _pttWithSender in PTT{_pttWithSender = v, ..}
  where
        f Nothing = Just (nextNonce, nonce)
        f (Just (l, u)) = Just (l, max u nonce)
        nonce = transactionNonce tx
        sender = accountAddressEmbed (transactionSender tx)

The low nonce is never updated to m, so forwardPTT (part of updating the focus block) gets confused since it expects a transaction with nonce m, but the low nonce is recorded as n.