Phala-Network / phala-blockchain

The Phala Network Blockchain, pRuntime and the bridge.
https://phala.network
Apache License 2.0
333 stars 148 forks source link

Khala v0.1.7 node got occasionally full network stale #551

Open h4x3rotab opened 3 years ago

h4x3rotab commented 3 years ago

This can happen around once per day. Usually the entire network got stuck. A restart (or sometimes two) can have them move forward.

Symptom

The entire network got stuck at a certain block (include full nodes and collators, expect one collator, usually). The nodes got panic when trying to import blocks. When it occurs, each node will fall into one of the following errors, repeat the same message, and never recover until a full restart.

Records:

Type 1: Transaction will be valid in the future

Nov 01 08:46:50 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:50 [Parachain] ⚙️  Syncing  0.0 bps, target=#659935 (49 peers), best: #659927 (0x8b0f…f74d), finalized #659927 (0x8b0f…f74d), ⬇ 75.8kiB/s ⬆ 48.5kiB/s
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:51 [Parachain] panicked at 'Transaction will be valid in the future', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/420db18/frame/executive/src/lib.rs:363:17
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:51 [Parachain] Block prepare storage changes error:
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x6458 - <unknown>!rust_begin_unwind\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\n    3: 0x17e7ea - <unknown>!Core_execute_block\n"))))
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:51 [Parachain] 💔 Error importing block 0x683c5aeafe84a0e9716ffab02b07fefa2856e68d426e5f59af9713bdd927bc64: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x6458 - <unknown>!rust_begin_unwind\\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\\n    3: 0x17e7ea - <unknown>!Core_execute_block\\n\")")))
Nov 01 08:46:51 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:51 [Relaychain] 💤 Idle (48 peers), best: #9904741 (0x81f7…bcdf), finalized #9904737 (0x59fc…0d0f), ⬇ 304.1kiB/s ⬆ 306.8kiB/s
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:54 [Parachain] panicked at 'Transaction will be valid in the future', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/420db18/frame/executive/src/lib.rs:363:17
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:54 [Parachain] Block prepare storage changes error:
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x6458 - <unknown>!rust_begin_unwind\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\n    3: 0x17e7ea - <unknown>!Core_execute_block\n"))))
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:54 [Parachain] 💔 Error importing block 0x683c5aeafe84a0e9716ffab02b07fefa2856e68d426e5f59af9713bdd927bc64: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x6458 - <unknown>!rust_begin_unwind\\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\\n    3: 0x17e7ea - <unknown>!Core_execute_block\\n\")")))
Nov 01 08:46:54 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:54 [Relaychain] ✨ Imported #9904742 (0x0058…f3f6)
Nov 01 08:46:55 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:55 [Parachain] ⚙️  Syncing  0.0 bps, target=#659935 (49 peers), best: #659927 (0x8b0f…f74d), finalized #659927 (0x8b0f…f74d), ⬇ 45.5kiB/s ⬆ 3.7kiB/s
Nov 01 08:46:56 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:56 [Parachain] panicked at 'Transaction will be valid in the future', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/420db18/frame/executive/src/lib.rs:363:17
Nov 01 08:46:56 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:56 [Parachain] Block prepare storage changes error:
Nov 01 08:46:56 khala-collators-asia collator_1.sh[23964]: RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x6458 - <unknown>!rust_begin_unwind\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\n    3: 0x17e7ea - <unknown>!Core_execute_block\n"))))
Nov 01 08:46:56 khala-collators-asia collator_1.sh[23964]: 2021-11-01 08:46:56 [Parachain] 💔 Error importing block 0x683c5aeafe84a0e9716ffab02b07fefa2856e68d426e5f59af9713bdd927bc64: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x6458 - <unknown>!rust_begin_unwind\\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\\n    2: 0x1568bf - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\\n    3: 0x17e7ea - <unknown>!Core_execute_block\\n\")")))

Type 2: Storage root must match that calculated

2021-11-02 11:50:45 [Parachain] panicked at 'Storage root must match that calculated.', /home/h4x/.cargo/git/checkouts/substrate-7e08433d4c370a21/420db18/frame/executive/src/lib.rs:473:9
2021-11-02 11:50:45 [Parachain] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x6458 - <unknown>!rust_begin_unwind\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\n    2: 0x153b - <unknown>!core::panicking::panic::h4aa999cda2b58621\n    3: 0x156951 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\n    4: 0x17e7ea - <unknown>!Core_execute_block\n"))))
2021-11-02 11:50:45 [Parachain] 💔 Error importing block 0x801c1798a33bd69a94fef636594bf16dc8c208cb893c3e3f37161fbf56635202: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x6458 - <unknown>!rust_begin_unwind\\n    1: 0x1e62 - <unknown>!core::panicking::panic_fmt::h189230e171f1a9ba\\n    2: 0x153b - <unknown>!core::panicking::panic::h4aa999cda2b58621\\n    3: 0x156951 - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::hbc8a99a38c425195\\n    4: 0x17e7ea - <unknown>!Core_execute_block\\n\")")))

Minor issue: NotInFinalizedChain

This one can happen after a restart of the node. It can also self recover after a few minutes.

2021-11-02 15:15:39 [Parachain] Block import error: NotInFinalizedChain
2021-11-02 15:15:39 [Parachain] 💔 Error importing block 0x0a1f6222b3a7235dd833038bf98151c01b8f8d99a7a332da3d0695f7646aab2d: Err(Other(ClientImport("Potential long-range attack: block not in finalized chain.")))

Other background information

We have encountered similar problems in the past when we were running our parachain testnet v1. It's suspicious that we were also running a parachain runtime based on a version of substrate (polkadot) older than the client's. After upgraded the runtime to match the substrate version of the client, we had never seen the problem again, until now.

coder-chenda commented 3 years ago

how to solve it?

h4x3rotab commented 3 years ago

@coder-chenda No solution yet. Now it's mitigated by restarting every time it got stuck. You can also try to add --state-cache-size 0 to the node cli. It's reported to be helpful but we cannot confirm.

yrong commented 3 years ago

same issue met, @h4x3rotab any progress?

2021-11-12 13:12:36 [Parachain🌈] 💤 Idle (17 peers), best: #737456 (0xccbf…34a7), finalized #737454 (0x5da9…457c), ⬇ 1.7kiB/s ⬆ 1.8kiB/s
2021-11-12 13:12:37 [Parachain🌈] panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:503:9
2021-11-12 13:12:37 [Parachain🌈] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x374566 - <unknown>!rust_begin_unwind\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\n    4: 0x26a629 - <unknown>!Core_execute_block\n"))))
2021-11-12 13:12:37 [Parachain🌈] 💔 Error importing block 0x230d09903695c483424b273c39f02a7bddff7ef419c1324448aca38b69d87d3a: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x374566 - <unknown>!rust_begin_unwind\\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\\n    4: 0x26a629 - <unknown>!Core_execute_block\\n\")")))
2021-11-12 13:12:37 [Parachain🌈] panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:503:9
2021-11-12 13:12:37 [Parachain🌈] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x374566 - <unknown>!rust_begin_unwind\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\n    4: 0x26a629 - <unknown>!Core_execute_block\n"))))
2021-11-12 13:12:37 [Parachain🌈] 💔 Error importing block 0x230d09903695c483424b273c39f02a7bddff7ef419c1324448aca38b69d87d3a: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x374566 - <unknown>!rust_begin_unwind\\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\\n    4: 0x26a629 - <unknown>!Core_execute_block\\n\")")))
2021-11-12 13:12:37 [Relaychain] 💤 Idle (8 peers), best: #10062243 (0x8298…e7e2), finalized #10062239 (0x7920…0f5e), ⬇ 270.1kiB/s ⬆ 114.8kiB/s
2021-11-12 13:12:38 [Parachain🌈] panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:503:9
2021-11-12 13:12:38 [Parachain🌈] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x374566 - <unknown>!rust_begin_unwind\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\n    4: 0x26a629 - <unknown>!Core_execute_block\n"))))
2021-11-12 13:12:38 [Parachain🌈] 💔 Error importing block 0x230d09903695c483424b273c39f02a7bddff7ef419c1324448aca38b69d87d3a: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x374566 - <unknown>!rust_begin_unwind\\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\\n    4: 0x26a629 - <unknown>!Core_execute_block\\n\")")))
2021-11-12 13:12:38 [Parachain🌈] panicked at 'Storage root must match that calculated.', /cargo-home/git/checkouts/substrate-7e08433d4c370a21/d76f399/frame/executive/src/lib.rs:503:9
2021-11-12 13:12:38 [Parachain🌈] Block prepare storage changes error:
RuntimeApiError(Application(Execution(Other("Wasm execution trapped: wasm trap: unreachable\nwasm backtrace:\n    0: 0x374566 - <unknown>!rust_begin_unwind\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\n    4: 0x26a629 - <unknown>!Core_execute_block\n"))))
2021-11-12 13:12:38 [Parachain🌈] 💔 Error importing block 0x230d09903695c483424b273c39f02a7bddff7ef419c1324448aca38b69d87d3a: Err(Other(ClientImport("Error at calling runtime api: Execution failed: Other(\"Wasm execution trapped: wasm trap: unreachable\\nwasm backtrace:\\n    0: 0x374566 - <unknown>!rust_begin_unwind\\n    1: 0x2637 - <unknown>!core::panicking::panic_fmt::h3ab5417155b7ba3b\\n    2: 0x1ccb - <unknown>!core::panicking::panic::h5bfdfaa3db9a4b4a\\n    3: 0x24fadc - <unknown>!frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block::h11a86dd3b827ed26\\n    4: 0x26a629 - <unknown>!Core_execute_block\\n\")")))
2021-11-12 13:12:41 [Relaychain] 🔍 Discovered new external address for our node: /ip4/100.114.237.0/tcp/30334/ws/p2p/12D3KooWCoh5WZGgPxo3tRt4sERrAA3CaDd8gCmx8BCvruaLJkdS
h4x3rotab commented 3 years ago

@yrong No. According to the tracking issue https://github.com/paritytech/cumulus/issues/607, there's nothing on our end can be done. If you have the control over all the collators, you can add --state-cache-size 0 to disable the state cache, and it seems the problem can be mitigated a bit, but it's not 100% guaranteed.

yrong commented 3 years ago

@h4x3rotab thks will try and let you know

h4x3rotab commented 2 years ago

Update: the current walkaround is to add --state-cache-size 0 to half of the collators, and apply auto restart script to all the collators. The network has been running for nearly one month without full network stale.

We are still waiting for https://github.com/paritytech/cumulus/issues/607 as the final fix.