chainflip-io / chainflip-backend

The Chainflip backend repo, including the Chainflip Node and CFE.
50 stars 15 forks source link

Substrate panics on state root mismatch #1074

Closed morelazers closed 2 years ago

morelazers commented 2 years ago

Description

Logging quickly as I'm mostly AFK today. Reported by a Discord member and I've seen something similar on Robbie's Validator.


chainflip-node      | Hash: given=d8ac79163cf001784382e2afe14ba1bcd2afd5611c32e5622e0bb0dd1dc662ac, expected=da2edd70de4c23ab78769900cd7ba8ec1c9b3083f5a84ecaa8009eedb33ba073
chainflip-node      | 
chainflip-node      | ====================
chainflip-node      | 
chainflip-node      | Version: 0.1.0-2b496be2-x86_64-linux-gnu
chainflip-node      | 
chainflip-node      |    0: sp_panic_handler::set::{{closure}}
chainflip-node      |    1: std::panicking::rust_panic_with_hook
chainflip-node      |              at rustc/b3d11f95cc5dd687fdd185ce91e02ebe40e6f46b/library/std/src/panicking.rs:626:17
chainflip-node      |    2: std::panicking::begin_panic::{{closure}}
chainflip-node      |    3: std::sys_common::backtrace::__rust_end_short_backtrace
chainflip-node      |    4: std::panicking::begin_panic
chainflip-node      |    5: frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::final_checks
chainflip-node      |    6: tracing::span::Span::in_scope
chainflip-node      |    7: frame_executive::Executive<System,Block,Context,UnsignedValidator,AllPallets,COnRuntimeUpgrade>::execute_block
chainflip-node      |    8: <state_chain_runtime::Runtime as sp_api::runtime_decl_for_Core::Core<sp_runtime::generic::block::Block<sp_runtime::generic::header::Header<u32,sp_runtime::traits::BlakeTwo256>,sp_runtime::generic::unchecked_extrinsic::UncheckedExtrinsic<sp_runtime::multiaddress::MultiAddress<<<sp_runtime::MultiSignature as sp_runtime::traits::Verify>::Signer as sp_runtime::traits::IdentifyAccount>::AccountId,()>,state_chain_runtime::Call,sp_runtime::MultiSignature,(frame_system::extensions::check_spec_version::CheckSpecVersion<state_chain_runtime::Runtime>,frame_system::extensions::check_tx_version::CheckTxVersion<state_chain_runtime::Runtime>,frame_system::extensions::check_genesis::CheckGenesis<state_chain_runtime::Runtime>,frame_system::extensions::check_mortality::CheckMortality<state_chain_runtime::Runtime>,frame_system::extensions::check_nonce::CheckNonce<state_chain_runtime::Runtime>,frame_system::extensions::check_weight::CheckWeight<state_chain_runtime::Runtime>)>>>>::execute_block
chainflip-node      |    9: std::panicking::try
chainflip-node      |   10: std::thread::local::LocalKey<T>::with
chainflip-node      |   11: sc_executor::native_executor::WasmExecutor::with_instance::{{closure}}
chainflip-node      |   12: sc_executor::wasm_runtime::RuntimeCache::with_instance
chainflip-node      |   13: sp_state_machine::execution::StateMachine<B,H,N,Exec>::execute_aux
chainflip-node      |   14: sp_state_machine::execution::StateMachine<B,H,N,Exec>::execute_using_consensus_failure_handler
chainflip-node      |   15: <sc_service::client::call_executor::LocalCallExecutor<Block,B,E> as sc_client_api::call_executor::CallExecutor<Block>>::contextual_call
chainflip-node      |   16: <sc_service::client::client::Client<B,E,Block,RA> as sp_api::CallApiAt<Block>>::call_api_at
chainflip-node      |   17: sp_api::runtime_decl_for_Core::execute_block_call_api_at
chainflip-node      |   18: <state_chain_runtime::RuntimeApiImpl<__SR_API_BLOCK__,RuntimeApiImplCall> as sp_api::Core<__SR_API_BLOCK__>>::Core_execute_block_runtime_api_impl  
chainflip-node      |   19: sp_api::Core::execute_block_with_context
chainflip-node      |   20: sc_service::client::client::Client<B,E,Block,RA>::prepare_block_storage_changes
chainflip-node      |   21: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
chainflip-node      |   22: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
chainflip-node      |   23: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
chainflip-node      |   24: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
chainflip-node      |   25: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
chainflip-node      |   26: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
chainflip-node      |   27: <sc_service::task_manager::prometheus_future::PrometheusFuture<T> as core::future::future::Future>::poll
chainflip-node      |   28: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
chainflip-node      |   29: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
chainflip-node      |   30: <tracing_futures::Instrumented<T> as core::future::future::Future>::poll
chainflip-node      |   31: std::thread::local::LocalKey<T>::with
chainflip-node      |   32: futures_executor::local_pool::block_on
chainflip-node      |   33: tokio::runtime::task::core::CoreStage<T>::poll
chainflip-node      |   34: tokio::runtime::task::harness::Harness<T,S>::poll
chainflip-node      |   35: tokio::runtime::blocking::pool::Inner::run
chainflip-node      |   36: std::sys_common::backtrace::__rust_begin_short_backtrace
chainflip-node      |   37: core::ops::function::FnOnce::call_once{{vtable.shim}}
chainflip-node      |   38: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
chainflip-node      |              at rustc/b3d11f95cc5dd687fdd185ce91e02ebe40e6f46b/library/alloc/src/boxed.rs:1575:9
chainflip-node      |       <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
chainflip-node      |              at rustc/b3d11f95cc5dd687fdd185ce91e02ebe40e6f46b/library/alloc/src/boxed.rs:1575:9
chainflip-node      |       std::sys::unix::thread::Thread::new::thread_start
chainflip-node      |              at rustc/b3d11f95cc5dd687fdd185ce91e02ebe40e6f46b/library/std/src/sys/unix/thread.rs:72:17
chainflip-node      |   39: start_thread
chainflip-node      |   40: clone
chainflip-node      |
chainflip-node      |
chainflip-node      | Thread 'tokio-runtime-worker' panicked at 'Storage root must match that calculated.', /usr/local/cargo/git/checkouts/substrate-a7ad12d678bd31ac/e563465/frame/executive/src/lib.rs:472
chainflip-node      |
chainflip-node      | This is a bug. Please report it at:
chainflip-node      |
chainflip-node      |   chainflip.io
chainflip-node      |
dandanlen commented 2 years ago

Permalink to the substrate fn that is throwing this: https://github.com/paritytech/substrate/blob/20a9bbb1fe47fcd62fcd64b2fa32456b4f434aaf/frame/executive/src/lib.rs#L452-L478

Just to clarify - this is technically not a runtime panic (otherwise all nodes would panic). It seems to be a localised issue with those nodes, although given that it has happened twice now we should of course get to the bottom of it. It also seems to be panicking on an assertion, ie. this is not some unexpected panic, it's a deliberate failsafe in the substrate node.

kylezs commented 2 years ago

Just a thought, could this have something to do with runtime upgrades and some nodes not having received it and then trying to connect? or something of that nature?

Given we've not really seen this before, and we've only really done a runtime upgrade now, could be related (or I could be way off)

morelazers commented 2 years ago

Saw this on Robbie's Validator on the morning of Dec 17, before we pushed any runtime upgrades.

dandanlen commented 2 years ago

Got a response from parity:

I think this may be related with this issue https://github.com/paritytech/substrate/issues/9697, in such case running your node with --state-cache-size 0 should avoid it for now

That issue was logged by Alain from Moonbeam and seems to be fairly widespread.

It's due to a caching bug and can be avoided by adding the params as above, at the expense of slightly slower block imports, apparently.

morelazers commented 2 years ago

I'll chuck it in the docs and then close the issue. Cheers Dan.