3Dpass / 3DP

The Implementation of The Ledger of Things Node. Layer 1 decentralized blockchain platform for the tokenization of objects. Proof of Scan protocol. Useful smart-contracts and dApps.
https://3dpass.org/
GNU General Public License v3.0
23 stars 17 forks source link

Validator node crash. #93

Closed xchainw closed 4 months ago

xchainw commented 4 months ago

Since upgrading to v24/v25, the validator nodes has encountered several crashes within a few days, and the relevant logs were found as follows. Before the update the validator nodes were running normally for months without crashing once. What is the cause of this problem?

2024-07-06 02:19:00 ?? Starting consensus session on top of parent 0x5424875a2afeedfbdf83bcb259e196c31cd16b753859c6b87e5a6af084031c1b
====================
Version: 3.1.0-5ea548d-x86_64-linux-gnu
   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1999:9
      std::panicking::rust_panic_with_hook
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:582:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys_common/backtrace.rs:151:18
   4: rust_begin_unwind
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/panicking.rs:67:14
   6: core::result::unwrap_failed
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/result.rs:1651:5
   7: core::result::Result<T,E>::unwrap
   8: <sc_consensus_poscan::PowBlockImport<B,I,C,S,Algorithm,CAW,CIDP,AccountId,BlockNumber> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
   9: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport<B>+Error = sp_consensus::error::Error+Transaction = Transaction+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  10: <&mut F as core::future::future::Future>::poll
  11: sc_consensus::import_queue::basic_queue::BlockImportWorker<B>::new::{{closure}}
  12: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
  13: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
  14: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  15: tokio::park::thread::CachedParkThread::block_on
  16: tokio::runtime::handle::Handle::block_on
  17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  18: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  19: tokio::runtime::task::harness::Harness<T,S>::poll
  20: tokio::runtime::blocking::pool::Inner::run
  21: std::sys_common::backtrace::__rust_begin_short_backtrace
  22: core::ops::function::FnOnce::call_once{{vtable.shim}}
  23: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys/unix/thread.rs:108:17
  24: <unknown>
  25: <unknown>
Thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Application(UnknownBlock("State already discarded for BlockId::Hash(0x220d1d66679abdf1fda75ef093c631f79f700a85c03dac4e87f461f28001106c)"))', /home/runner/work/3DP/3DP/consensus/poscan/src/lib.rs:565
This is a bug. Please report it at:
    https://github.com/3Dpass/3DP/issues
skyours commented 4 months ago

I'm having the same problem.

xchainw commented 4 months ago

Another crash occurred about an hour ago, checking telemetry, a number of nodes in the validator set all crashed at the same time. Checking the logs it seems to be caused by an error when importing block #951440.

2024-07-06 17:12:20 💤 Idle (40 peers), best: #951438 (0x312e…050b), finalized #951435 (0xb16e…0385), ⬇ 284.8kiB/s ⬆ 805.7kiB/s
2024-07-06 17:12:25 💤 Idle (40 peers), best: #951438 (0x312e…050b), finalized #951436 (0x169f…8ad1), ⬇ 666.8kiB/s ⬆ 560.3kiB/s
2024-07-06 17:12:30 poscan: start validate
2024-07-06 17:12:30 >>> object is valid (runtime)
2024-07-06 17:12:30 >>> poscan: end validate
2024-07-06 17:12:30 >>> object is valid
2024-07-06 17:12:31 💤 Idle (40 peers), best: #951438 (0x312e…050b), finalized #951436 (0x169f…8ad1), ⬇ 535.7kiB/s ⬆ 624.4kiB/s
2024-07-06 17:12:32 ✨ Imported #951439 (0x0b68…7015)
2024-07-06 17:12:32 🙌 Starting consensus session on top of parent 0x0b688bc64f962f11caa031db782b847781dd0a338a60546d086f4f96d38a7015
2024-07-06 17:12:33 🎁 Prepared block for proposing at 951440 (518 ms) [hash: 0x8e877934a159151ce5d9822cfa9dd8442fd1ca18251c7ea663ba79f598d964b6; parent_hash: 0x0b68…7015; extrinsics (8): [0xe029…84fe, 0xbcd8…d2f0, 0x17e7…9117, 0x8734…7dd0, 0x7b1e…3160, 0xfac8…8358, 0xbf77…1bb6, 0xc14f…71a7]]
2024-07-06 17:12:35 poscan: start validate
2024-07-06 17:12:35 >>> object is valid (runtime)
2024-07-06 17:12:35 >>> poscan: end validate
2024-07-06 17:12:35 >>> object is valid
2024-07-06 17:12:36 💤 Idle (40 peers), best: #951439 (0x0b68…7015), finalized #951436 (0x169f…8ad1), ⬇ 471.6kiB/s ⬆ 477.6kiB/s
2024-07-06 17:12:37 >>> verify: seal.hist_hash != hist_hash
2024-07-06 17:12:37 >>> InvalidSeal after self.algorithm.verify
2024-07-06 17:12:37 💔 Error importing block 0x09309ed2ca24d7872337ebdc22fec4f68e0e8dc00f51809aa8b10dee98145655: consensus error: Import failed: PoScan validation error: invalid seal
====================
Version: 3.1.0-5ea548d-x86_64-linux-gnu
   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1999:9
      std::panicking::rust_panic_with_hook
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:582:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys_common/backtrace.rs:151:18
   4: rust_begin_unwind
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/panicking.rs:67:14
   6: core::result::unwrap_failed
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/result.rs:1651:5
   7: core::result::Result<T,E>::unwrap
   8: <sc_consensus_poscan::PowBlockImport<B,I,C,S,Algorithm,CAW,CIDP,AccountId,BlockNumber> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
   9: <alloc::boxed::Box<dyn sc_consensus::block_import::BlockImport<B>+Error = sp_consensus::error::Error+Transaction = Transaction+core::marker::Sync+core::marker::Send> as sc_consensus::block_import::BlockImport<B>>::import_block::{{closure}}
  10: <&mut F as core::future::future::Future>::poll
  11: sc_consensus::import_queue::basic_queue::BlockImportWorker<B>::new::{{closure}}
  12: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
  13: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
  14: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
  15: tokio::park::thread::CachedParkThread::block_on
  16: tokio::runtime::handle::Handle::block_on
  17: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  18: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
  19: tokio::runtime::task::harness::Harness<T,S>::poll
  20: tokio::runtime::blocking::pool::Inner::run
  21: std::sys_common::backtrace::__rust_begin_short_backtrace
  22: core::ops::function::FnOnce::call_once{{vtable.shim}}
  23: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys/unix/thread.rs:108:17
  24: <unknown>
  25: <unknown>
Thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Application(UnknownBlock("State already discarded for BlockId::Hash(0xead911c9af64012557626d8cc228d4a8e1c0f041f9b699edfab5a08b73ee0035)"))', /home/runner/work/3DP/3DP/consensus/poscan/src/lib.rs:565
This is a bug. Please report it at:
    https://github.com/3Dpass/3DP/issues
mishk commented 4 months ago

Will be fixed in 1-2 days

PaulSff commented 4 months ago

Here is the fix: https://github.com/3Dpass/3DP/commit/70205fc636cf029dc9b042886cd979441ffc7861

xchainw commented 3 months ago
2024-07-13 06:03:00 🎁 Prepared block for proposing at 960728 (426 ms) [hash: 0x27a81f71517c81177081c7198a7a8241cb0d113ab9ba44dd08bdf65c111ceef2; parent_hash: 0x37e9…65d9; extrinsics (2): [0xf6f7…0a75, 0xd7a2…3985]]
2024-07-13 06:03:01 💤 Idle (12 peers), best: #960727 (0x37e9…65d9), finalized #960725 (0x1d5d…cb0f), ⬇ 205.4kiB/s ⬆ 403.4kiB/s
2024-07-13 06:03:05 poscan: start validate
2024-07-13 06:03:05 >>> object is valid (runtime)
2024-07-13 06:03:05 >>> poscan: end validate
2024-07-13 06:03:05 >>> object is valid
2024-07-13 06:03:06 💤 Idle (12 peers), best: #960727 (0x37e9…65d9), finalized #960725 (0x1d5d…cb0f), ⬇ 290.2kiB/s ⬆ 444.2kiB/s
2024-07-13 06:03:10 ♻️  Reorg on #960727,0x37e9…65d9 to #960728,0x0a7e…9c56, common ancestor #960725,0x1d5d…cb0f
2024-07-13 06:03:10 ✨ Imported #960728 (0x0a7e…9c56)
2024-07-13 06:03:10 🙌 Starting consensus session on top of parent 0x0a7ecf7a550e1fc85521204bb408090bf3e6fa91bddcb42e6868798d2c369c56
2024-07-13 06:03:10 🎁 Prepared block for proposing at 960729 (346 ms) [hash: 0x3b54b5e7f904039f310bf935b98766c9c82b7bd4535bf429944baf0f7597ae31; parent_hash: 0x0a7e…9c56; extrinsics (1): [0xd74e…6de7]]
2024-07-13 06:03:11 💤 Idle (12 peers), best: #960728 (0x0a7e…9c56), finalized #960726 (0xad8e…569f), ⬇ 102.6kiB/s ⬆ 312.0kiB/s
====================
Version: 3.1.0-70205fc-x86_64-linux-gnu
   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1999:9
      std::panicking::rust_panic_with_hook
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:582:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys_common/backtrace.rs:151:18
   4: rust_begin_unwind
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/panicking.rs:67:14
   6: core::result::unwrap_failed
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/result.rs:1651:5
   7: std::sys_common::backtrace::__rust_begin_short_backtrace
   8: core::ops::function::FnOnce::call_once{{vtable.shim}}
   9: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys/unix/thread.rs:108:17
  10: start_thread
             at ./nptl/./nptl/pthread_create.c:442:8
  11: __GI___clone3
             at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: UnknownBlock("State already discarded for BlockId::Hash(0x37e90df3ce53362b1e133d9eb07f1611ae2165e33042b56dcc67ffd1416365d9)")', nodes/poscan-consensus/src/service.rs:463
This is a bug. Please report it at:
    https://github.com/3Dpass/3DP/issues
====================
Version: 3.1.0-70205fc-x86_64-linux-gnu
   0: sp_panic_handler::set::{{closure}}
   1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1999:9
      std::panicking::rust_panic_with_hook
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:695:13
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:582:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys_common/backtrace.rs:151:18
   4: rust_begin_unwind
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/panicking.rs:578:5
   5: core::panicking::panic_fmt
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/panicking.rs:67:14
   6: core::result::unwrap_failed
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/core/src/result.rs:1651:5
   7: std::sys_common::backtrace::__rust_begin_short_backtrace
   8: core::ops::function::FnOnce::call_once{{vtable.shim}}
   9: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/alloc/src/boxed.rs:1985:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/521f4dae1bdf7fe9cf9436ecef9ee7c7442708bf/library/std/src/sys/unix/thread.rs:108:17
  10: start_thread
             at ./nptl/./nptl/pthread_create.c:442:8
  11: __GI___clone3
             at ./misc/../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: UnknownBlock("State already discarded for BlockId::Hash(0x37e90df3ce53362b1e133d9eb07f1611ae2165e33042b56dcc67ffd1416365d9)")', nodes/poscan-consensus/src/service.rs:463
This is a bug. Please report it at:
    https://github.com/3Dpass/3DP/issues
====================