The "2 phase submit" refactor of the public transaction manager DB transaction model in #179, introduced a problem with nonce allocation. Specifically on restart when there is a public transaction in the DB that is not yet submitted to the mempool of the blockchain, the nonce allocation attempts to re-use a nonce already allocated in the DB of the node.
Digging into this showed that we had a significant issue to resolve in the database transaction model, between the Transaction Manager/Private Transaction Manager submitting the transaction, and the Public Transaction Manager attempting to allocate a nonce.
Inherited from previous generations, we were assigning the nonce in-line with the submission, but this is problematic when we have batch database transactions. We need to know in this moment what other nonces are in-flight for that signing from address, across the database, and the eth_getTransactionCount call against the blockchain.
This is further complicated by the fact we might be submitting batches of public transactions covering multiple signing addresses. Two threads racing could hit A-B, B-A deadlocks if we took fine grained locks on an individual signing address. Then course grained locks would be very problematic for transaction submission performance.
The solution
This PR splits nonce assignment out from transaction submission.
A new localId (internally referred to as pub_tx_id in the DB) is assigned to every PublicTx
This is a numeric sequence managed by the DB
If two threads are racing with active DB transactions, then a lower number might be committed before a higher number
However, within one thread/DB-TX the numbers are assured to be increasing
Nonce allocation is deferred to the orchestrator
There is only one of these per signing address, so it is ideally positioned to do the work
This thread polls from the DB in pub_txn_id ascending sequence
It might rewind in the case of racing DB TXs described above, so nonces...
Are assured to be allocated in ascending order to txs committed in the same DB transaction
Are not assured to be in strictly ascending order compared to pub_txn_id
The previous signer_nonce string column is removed as the primary join between public TX tables
We can no longer count on having a nonce at creation time of the DB record
Instead we use the pub_txn_id as the join entry
This required some quite complex DB migrations
DB Migration notes
Due to the early phase of the release cycle, I propose the following compromises to limit the complexity of the DB migrations:
There are no down migrations to re-instate the old signer_nonce behavior, for either DB
For PostgreSQL data is retained on the up migration
For SQLite data is not retained on the up migration
Other fixes
Remove the 30s delay in unit tests introduced in #403 due to ethclient having an internal retry for availability of the node. This code was moved to componentmgr, and changed to be configurable, and use the default retry package.
Set --tx-pool-limit-by-account-percentage to 1.0 in the default k8s Besu setup to avoid:
[2024-11-19T18:30:13.832Z] ERROR RPC[000000061] <-- ERROR: Transaction nonce is too distant from current sender nonce
Stop creating un-used database tables transaction_delegations and transaction_delegation_acknowledgements
Remove syncpoint code associated with said DB tables
Introduce a deterministic election of a coordinator for any endorsement set based on deterministic sorting of endorsement parties, and unique nodes, and simple numeric hash to determine coordinator.
~Resolve this situation where delegations cross in the post, by adding a blockHeight to delegations, and when an incoming delegation-request comes in at a higher block height than an outgoing delegation-request, override the delegation and become the coordinator.~ This fix is incomplete, as it resulted in follow-on issues, which @hosie will work on over coming sprint.
/tmp/t1.txt:[2024-11-19T19:01:40.941Z] DEBUG SelectCoordinatorNode: selected coordinator node node2 using round robin algorithm for blockHeight: 6497 and rangeSize 100 pid=1 role=pctm-loop-0x89e639ba9426c6b1f12a4a52cc5d92f56bffbf02
/tmp/t1.txt:[2024-11-19T19:01:40.944Z] INFO transactionFlow:Action TransactionID='f352b82e-464c-4dc4-b112-f1a6f6150fb8' Status='delegating' LatestEvent='TransactionSubmittedEvent' LatestError='' : Delegating transaction to node2 pid=1 role=pctm-loop-0x89e639ba9426c6b1f12a4a52cc5d92f56bffbf02
/tmp/t3.txt:[2024-11-19T19:01:43.063Z] DEBUG SelectCoordinatorNode: selected coordinator node node2 using round robin algorithm for blockHeight: 6499 and rangeSize 100 pid=1 role=pctm-loop-0x89e639ba9426c6b1f12a4a52cc5d92f56bffbf02
/tmp/t3.txt:[2024-11-19T19:01:43.068Z] INFO transactionFlow:Action TransactionID='f352b82e-464c-4dc4-b112-f1a6f6150fb8' Status='delegating' LatestEvent='TransactionSwappedInEvent' LatestError='' : Delegating transaction to node2 pid=1 role=pctm-loop-0x89e639ba9426c6b1f12a4a52cc5d92f56bffbf02
/tmp/t2.txt:[2024-11-19T19:01:43.055Z] DEBUG SelectCoordinatorNode: selected coordinator node node3 using round robin algorithm for blockHeight: 6500 and rangeSize 100 pid=1 role=pctm-loop-0x89e639ba9426c6b1f12a4a52cc5d92f56bffbf02
/tmp/t2.txt:[2024-11-19T19:01:43.073Z] INFO transactionFlow:Action TransactionID='f352b82e-464c-4dc4-b112-f1a6f6150fb8' Status='delegating' LatestEvent='TransactionAssembledEvent' LatestError='' : Delegating transaction to node3 pid=1 role=pctm-loop-0x89e639ba9426c6b1f12a4a52cc5d92f56bffbf02
The problem
The "2 phase submit" refactor of the public transaction manager DB transaction model in #179, introduced a problem with nonce allocation. Specifically on restart when there is a public transaction in the DB that is not yet submitted to the mempool of the blockchain, the nonce allocation attempts to re-use a nonce already allocated in the DB of the node.
Digging into this showed that we had a significant issue to resolve in the database transaction model, between the
Transaction Manager
/Private Transaction Manager
submitting the transaction, and thePublic Transaction Manager
attempting to allocate a nonce.Inherited from previous generations, we were assigning the nonce in-line with the submission, but this is problematic when we have batch database transactions. We need to know in this moment what other nonces are in-flight for that signing
from
address, across the database, and theeth_getTransactionCount
call against the blockchain.This is further complicated by the fact we might be submitting batches of public transactions covering multiple signing addresses. Two threads racing could hit A-B, B-A deadlocks if we took fine grained locks on an individual signing address. Then course grained locks would be very problematic for transaction submission performance.
The solution
This PR splits nonce assignment out from transaction submission.
localId
(internally referred to aspub_tx_id
in the DB) is assigned to everyPublicTx
orchestrator
pub_txn_id
ascending sequencepub_txn_id
signer_nonce
string column is removed as the primary join between public TX tablesnonce
at creation time of the DB recordpub_txn_id
as the join entryDB Migration notes
Due to the early phase of the release cycle, I propose the following compromises to limit the complexity of the DB migrations:
down
migrations to re-instate the oldsigner_nonce
behavior, for either DBup
migrationup
migrationOther fixes
ethclient
having an internal retry for availability of the node. This code was moved tocomponentmgr
, and changed to be configurable, and use the defaultretry
package.--tx-pool-limit-by-account-percentage
to1.0
in the default k8s Besu setup to avoid:transaction_delegations
andtransaction_delegation_acknowledgements
~Resolve this situation where delegations cross in the post, by adding a
blockHeight
to delegations, and when an incoming delegation-request comes in at a higher block height than an outgoing delegation-request, override the delegation and become the coordinator.~ This fix is incomplete, as it resulted in follow-on issues, which @hosie will work on over coming sprint.