Closed stakeworks closed 4 months ago
We've noticed similar behaviour on our collators since updating to v5.38.0:
2024-05-11 13:57:18 [Parachain] 🙌 Starting consensus session on top of parent 0x822fac0ac72e21c0912b5b66a00893434d5061c4f7f5b99c6b74b6878d322a1a
2024-05-11 13:57:18 [Parachain] panicked at /home/builder/cargo/git/checkouts/polkadot-sdk-cff69157b985ed76/c8d2251/substrate/frame/aura/src/lib.rs:139:21:
Slot must increase
2024-05-11 13:57:18 [Parachain] Proposing failed: Import failed: Error at calling runtime api: Execution failed: Execution aborted due to trap: wasm trap: wasm `unreachable` instruction executed
WASM backtrace:
error while executing at wasm backtrace:
0: 0x525d73 - <unknown>!rust_begin_unwind
1: 0x448646 - <unknown>!core::panicking::panic_fmt::h7d22643b0becf577
2: 0x274fee - <unknown>!<pallet_aura::pallet::Pallet<T> as frame_support::traits::hooks::OnInitialize<<<<T as frame_system::pallet::Config>::Block as sp_runtime::traits::HeaderProvider>::HeaderT as sp_runtime::traits::Header>::Number>>::on_initialize::h707b0ad9b933902b
3: 0x2d2bdc - <unknown>!<(TupleElement0,TupleElement1,TupleElement2,TupleElement3,TupleElement4,TupleElement5,TupleElement6,TupleElement7,TupleElement8,TupleElement9,TupleElement10,TupleElement11,TupleElement12,TupleElement13,TupleElement14,TupleElement15,TupleElement16,TupleElement17,TupleElement18,TupleElement19,TupleElement20,TupleElement21,TupleElement22,TupleElement23,TupleElement24,TupleElement25,TupleElement26,TupleElement27,TupleElement28,TupleElement29,TupleElement30,TupleElement31) as frame_support::traits::hooks::OnInitialize<BlockNumber>>::on_initialize::h3a0ccdde4267b583
4: 0x262c47 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPalletsWithSystem,COnRuntimeUpgrade>::initialize_block::h71f884d6dc98ee75
5: 0x1a2ca4 - <unknown>!Core_initialize_block
2024-05-11 13:57:22 [Relaychain] 💤 Idle (9 peers), best: #20726271 (0x90de…f0ba), finalized #20726269 (0x7e64…69a5), ⬇ 69.3kiB/s ⬆ 202.4kiB/s
@DanielMagro97 Can please open a new issue and share logs and how start command ?
@stakeworks Thanks for opening the issue. Here are my findings and Outcome, I will leave the issue open if you have any question and close it in some days.
What Happened? In the logs shared in the issue, we observe the following block details:
Astar:
Block #6161187
Hash: 0x964bf1249fb797a476ef0baadb2a785c11ae31d6267986fc072dacf698f31739
Block #6161188
Hash: 0x66745dd79eecf69d58e3183266225e8ce422289e90bb1807df7581ac56e1067d
parentHash(): 0x964bf1249fb797a476ef0baadb2a785c11ae31d6267986fc072dacf698f31739
Block #6161189
Hash: 0x2b388fb198aa42b90da0182533da3cab38d60c4f85c1b8391249925e2b031f6b
parentHash(): 0x66745dd79eecf69d58e3183266225e8ce422289e90bb1807df7581ac56e1067d
Relay:
Block#20755840
Hash: 0x0dc0711f928d5e8da58122ec45a692ea2077b4325186fada45a4c29ff049bebd
para_head: 0x964bf1249fb797a476ef0baadb2a785c11ae31d6267986fc072dacf698f31739
Block #20755841
Hash: 0x96d7691546fcf495d796ac671828cc5ca0914987f82ddda627f31efe595ef9da
No para_head included (still 0x964bf1249fb797a476ef0baadb2a785c11ae31d6267986fc072dacf698f31739
)
Block #20755842
Hash: 0x7b1bb0dd4d42aa05a6c62b4436ae67e7dd8b8ad3cfe33a83077faf3d116d143d
para_head: 0x66745dd79eecf69d58e3183266225e8ce422289e90bb1807df7581ac56e1067d
At the top of the screenshot, we see:
_starting collation. relayparent: 0x96d7691546fcf495d796ac671828cc5ca0914987f82ddda627f31efe595ef9da
at: 0x66745dd79eecf69d58e3183266225e8ce422289e90bb1807df7581ac56e1067d
This indicates that we are attempting to propose Astar block #6161189
with importing Relay block #20755841
. However, the para_head
for the imported relay block is 0x964bf1249fb797a476ef0baadb2a785c11ae31d6267986fc072dacf698f31739
, while the parentHash of Astar #6161189
block is 0x66745dd79eecf69d58e3183266225e8ce422289e90bb1807df7581ac56e1067d
, causing the panic error.
Subsequently, the logs show repeated failures to propose the block until Relay block #20755842
is imported with the matching para_head
and parentHash
, allowing Astar block #6161189
to be proposed and pre-sealed.
Why This Happened?
A new consensus hook ExpectParentIncluded
was introduced to facilitate the migration to async backing.
Mechanism:
The hook expects that the para_head
of the current relay chain block being proposed contains the parentHash
of the current parachain block being proposed. However, it's possible for the para_head
to not be updated in some relay chain blocks (like in#20755841
), leading to the parachain block being proposed using the n+1 parachain block (#20755842
).
A comment on the related P.SDK PR explains this situation and even suggests removing the code that triggers the panic.
Outcome
Despite the panic
, it doesn't seem critical as the block is proposed in the next parachain block. This panic just wait for the next relay block, to ensure n-1 parachain block equal para_head. As this will be updated in the next releases this can be ignored for now.
Criticality: Low/Informative
Description
Steps to Reproduce Upgrade to v5.38.0 and restart node
Environment
astar-collator 5.38.0-4cbf1a2f085 Ubuntu 22.04.4 LTS
ExecStart=astar-collator \ --collator \ --base-path /path/ \ --name 'StakeWorks | Astar | HNM04' \ --chain astar \ --telemetry-url 'wss://telemetry.polkadot.io/submit/ 0' \ --trie-cache-size 268435456 \ --max-parallel-downloads 6 \ --no-private-ipv4 \ --in-peers 35 \ --out-peers 25 \ --port 30313 \ --rpc-port 9915 \ --prometheus-port 19615 \ --prometheus-external \ -- \ --base-path /path/ \ --no-private-ipv4 \ --port 30316 \ --rpc-port 9918 \ --in-peers 35 \ --out-peers 25 \ --sync warp \ --state-pruning 100 \ --blocks-pruning 100 \ --no-prometheus