aurora-is-near / borealis-engine-lib

Creative Commons Zero v1.0 Universal
11 stars 3 forks source link

The node suddenly crashes and reports an error #175

Open COLUD4 opened 1 week ago

COLUD4 commented 1 week ago

We started getting the error below on all of our refiner instances at the same time followed by a bunch of ERROR We're running the refiner v0.28.2-2.2.1

log: 2024-10-29T14:49:34.189809Z INFO db: Opened a new RocksDB instance. num_instances=1 2024-10-29T14:49:34.220203Z INFO db: Closed a RocksDB instance. num_instances=0 2024-10-29T14:49:34.220216Z INFO db_opener: The database exists. path=/mnt/auroramain/node/near/data 2024-10-29T14:49:34.220226Z INFO db: Opened a new RocksDB instance. num_instances=1 2024-10-29T14:49:34.492484Z INFO db: Closed a RocksDB instance. num_instances=0 2024-10-29T14:49:34.492515Z INFO db: Opened a new RocksDB instance. num_instances=1 2024-10-29T14:49:34.517457Z INFO db: Closed a RocksDB instance. num_instances=0 2024-10-29T14:49:34.517474Z INFO db: Opened a new RocksDB instance. num_instances=1 2024-10-29T14:49:34.726174Z INFO memtrie: Loading tries to memory for shards []... 2024-10-29T14:49:34.726657Z INFO memtrie: Memtries loading complete for shards [] 2024-10-29T14:49:34.726664Z INFO chain: Init: header head @ #131481606 9mJBsT9c57Hf2L4LssaAQuVVoMRnwVFtF1u2a3oLk93m; block head @ #131480858 G9EehazGkzP7je1ytKfb4KM4Fj4Mk2hRyVnzL3w1r2UJ 2024-10-29T14:49:34.739643Z INFO store: Read FlatState values inlining migration status status=Finished 2024-10-29T14:49:34.739944Z INFO network: Starting http server at 0.0.0.0:3030 2024-10-29T14:49:34.740032Z INFO bandwidth: Bandwidth stats total_bandwidth_used_by_all_peers=0 total_msg_received_count=0 2024-10-29T14:49:34.740194Z INFO actix_server::builder: starting 4 workers 2024-10-29T14:49:34.752519Z INFO stats: #131480858 Waiting for peers 0 peers ⬇ 0 B/s ⬆ 0 B/s NaN bps 0 gas/s CPU: 0%, Mem: 107 MB 2024-10-29T14:49:35.030281Z INFO network: failed to connect to ed25519:5f12iem6YtAtmvupyPB1JW7csHWPPZ21yANXhCpJZf3c@135.181.216.150:24567 err="PeerActor::spawn(): outbound not allowed: already started another outbound connection to this peer" 2024-10-29T14:49:35.073082Z INFO network: failed to connect to ed25519:5f12iem6YtAtmvupyPB1JW7csHWPPZ21yANXhCpJZf3c@135.181.216.150:24567 err="PeerActor::spawn(): outbound not allowed: already started another outbound connection to this peer" 2024-10-29T14:49:35.148178Z INFO network: failed to connect to ed25519:7XpMndfHKHyKTQSQ6mRChF1unmzftNMFE6hPENtoCEBB@141.95.173.109:24567 err="PeerActor::spawn(): outbound not allowed: already started another outbound connection to this peer" 2024-10-29T14:49:35.239869Z INFO network: failed to connect to ed25519:5f12iem6YtAtmvupyPB1JW7csHWPPZ21yANXhCpJZf3c@135.181.216.150:24567 err="PeerActor::spawn(): outbound not allowed: already started another outbound connection to this peer" 2024-10-29T14:49:35.302614Z INFO network: ed25519:HPrBScT6pj6Dsg7WRCQMQWjWrMe5pWBk2N4bGSc3Ytds: Peer Some(ed25519:H62grQCpzfiB145i7u4CMn8ayx6gpGbHr75NykcmoWs1@15.204.140.58:24567) disconnected, reason: handshake failed 2024-10-29T14:49:35.402832Z INFO indexer: Starting Streamer... 2024-10-29T14:49:35.457253Z INFO actix_server::server: Actix runtime found; starting in Actix runtime 2024-10-29T14:49:35.743156Z INFO network: Failed to connect to ed25519:BDpcMW9GM2HnHSCbZsYW3SnnqNnAmB4ben6f2ggxUEPs@176.9.63.6:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.743185Z INFO network: Failed to connect to ed25519:9zHP5vP1b35HxidyEuGEoYKXD1DSJKxrm4HNirU66Ps5@65.109.159.27:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.743195Z INFO network: Failed to connect to ed25519:BRQRrwjpssWkizg9VpdHCz7reTUkt4uezis2jyipDS1a@13.212.26.195:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.743206Z INFO network: Failed to connect to ed25519:7ZFx8931YVkPawmYmdyaLG8tvPYrSAer1RVLA3bx8pfU@15.204.64.35:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.757681Z INFO network: failed to connect to ed25519:2PRvU1k6mz3XPCCXNVe2UCfRjcnsLuXbNtY6AmEBYXWo@34.159.45.153:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.763781Z INFO network: failed to connect to ed25519:7fPwmfuDAAGLz8UG8oJC5Pv6zz2wAJ2RfVZTTFLV7aS8@65.21.14.20:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.806889Z INFO network: failed to connect to ed25519:9CfYdSfdPDnu11hQics2Ce4XyVQxAGejJiavkoA4DW2Y@37.27.55.251:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.824588Z INFO network: failed to connect to ed25519:3js6UvTT6vcFXgxVgVCtqJms6j8pbcQECSZXXHrYqBSr@165.227.129.110:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.865151Z INFO network: failed to connect to ed25519:EchQTy8KrP1BgaJjoDoL24NNt8GDu2xDVyxJEA2emBoh@5.9.153.244:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.910821Z INFO network: failed to connect to ed25519:3js6UvTT6vcFXgxVgVCtqJms6j8pbcQECSZXXHrYqBSr@165.227.129.110:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:35.938869Z INFO network: failed to connect to ed25519:3js6UvTT6vcFXgxVgVCtqJms6j8pbcQECSZXXHrYqBSr@165.227.129.110:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:36.002999Z INFO network: failed to connect to ed25519:6ybckHHwkwxTynYkgTn8S9mUkP39KsC8jiNqGs8i96A8@168.119.12.243:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:36.038204Z INFO network: failed to connect to ed25519:6Q5QPfPXFx6gkzuF5TjnhcAcRVafnYBkL4T2S7yfJ1Ej@65.109.38.34:24567 err="tcp::Stream::connect(): deadline has elapsed" 2024-10-29T14:49:36.191276Z WARN aurora_standalone_engine::sync: Incorrect result in processing receipt_id=0x8d73fbce7b59ff28eba99d185f49c018fb87be7a341d3f435693b4cadcb38c49 computed differed from expected 2024-10-29T14:49:36.191289Z WARN aurora_standalone_engine::sync: Receipt 0x8d73fbce7b59ff28eba99d185f49c018fb87be7a341d3f435693b4cadcb38c49 diff mismatch with computed diff 2024-10-29T14:49:36.192371Z WARN aurora_refiner_lib::refiner_inner: Mismatch between Near and Engine outputs. The internal Engine instance may not have the correct state. thread 'main' panicked at /usr/local/cargo/git/checkouts/aurora-engine-7e378a275ccc464a/28396c1/engine-precompiles/src/xcc.rs:262:16: ERR_MISSING_WNEAR_ADDRESS note: run with RUST_BACKTRACE=1 environment variable to display a backtrace thread 'thread 'actix-rt system:0 arbiter:1actix-rt system:0 arbiter:0' panicked at ' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/time/entry.rs/usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.39.2/src/runtime/time/entry.rs::568568::99:

A Tokio 1.x context was found, but it is being shutdown.A Tokio 1.x context was found, but it is being shutdown.

alexander-borodulya commented 1 week ago

Hey @COLUD4, thanks for bringing it up! I’m looking into a possible fix.

yuanwj210 commented 1 week ago

Hey @COLUD4, thanks for bringing it up! I’m looking into a possible fix.

version:0.28.2-2.3.0-rc.4 I also got an error using this version

srpc2-refiner  | starting refiner: [/usr/local/bin/aurora-refiner -c /config/refiner.json run --height 131480846]
srpc2-refiner  | 2024-10-30T02:25:41.118937Z  INFO indexer: Load config from /near...
srpc2-refiner  | 2024-10-30T02:25:41.119196Z  WARN neard: /near/config.json: encountered unrecognised fields: network.external_address, network.reconnect_delay, consensus.reduce_wait_for_missing_block, consensus.state_fetch_horizon
srpc2-refiner  | 2024-10-30T02:25:41.119208Z  INFO config: Validating Config, extracted from config.json...
srpc2-refiner  | 2024-10-30T02:25:41.121769Z  WARN genesis: Skipped genesis validation
srpc2-refiner  | 2024-10-30T02:25:41.121837Z  INFO config: Validating Genesis config and records. This could take a few minutes...
srpc2-refiner  | 2024-10-30T02:25:41.122082Z  INFO config: All validations have passed!
srpc2-refiner  | 2024-10-30T02:25:41.122391Z  INFO db_opener: Opening NodeStorage path="/near/data" cold_path="none"
srpc2-refiner  | 2024-10-30T02:25:41.122424Z  INFO db: Opened a new RocksDB instance. num_instances=1
srpc2-refiner  | 2024-10-30T02:25:41.163030Z  INFO db: Closed a RocksDB instance. num_instances=0
srpc2-refiner  | 2024-10-30T02:25:41.163040Z  INFO db_opener: The database exists. path=/near/data
srpc2-refiner  | 2024-10-30T02:25:41.163049Z  INFO db: Opened a new RocksDB instance. num_instances=1
srpc2-refiner  | 2024-10-30T02:25:41.514928Z  INFO db: Closed a RocksDB instance. num_instances=0
srpc2-refiner  | 2024-10-30T02:25:41.514954Z  INFO db: Opened a new RocksDB instance. num_instances=1
srpc2-refiner  | 2024-10-30T02:25:41.540580Z  INFO db: Closed a RocksDB instance. num_instances=0
srpc2-refiner  | 2024-10-30T02:25:41.540593Z  INFO db: Opened a new RocksDB instance. num_instances=1
srpc2-refiner  | 2024-10-30T02:25:41.828969Z  INFO memtrie: Loading tries to memory for shards []...
srpc2-refiner  | 2024-10-30T02:25:41.832258Z  INFO memtrie: Memtries loading complete for shards []
srpc2-refiner  | 2024-10-30T02:25:41.832265Z  INFO chain: Init: header head @ #131501015 4NH17kobYiY6Whd7uPMCrMTMMTEPpVcmbTxThm8bCmJE; block head @ #131480965 4WTaUH47DiwLG4oUXgbDaxqhMvTnYYS3aX6aYUUFEK6C
srpc2-refiner  | 2024-10-30T02:25:41.835610Z  INFO client: Starting validator node: node0fa16dd8b3426c06.near
srpc2-refiner  | 2024-10-30T02:25:41.835643Z  WARN client: The account 'node0fa16dd8b3426c06.near' is not a current validator, this node won't be validating in the current epoch
srpc2-refiner  | 2024-10-30T02:25:41.850759Z  INFO network: Starting http server at 0.0.0.0:3030
srpc2-refiner  | 2024-10-30T02:25:41.850853Z  INFO actix_server::builder: starting 4 workers
srpc2-refiner  | 2024-10-30T02:25:41.850864Z  INFO bandwidth: Bandwidth stats total_bandwidth_used_by_all_peers=0 total_msg_received_count=0
srpc2-refiner  | 2024-10-30T02:25:41.876469Z  INFO stats: #131480965 Waiting for peers 0 peers ⬇ 0 B/s ⬆ 0 B/s NaN bps 0 gas/s CPU: 0%, Mem: 167 MB
srpc2-refiner  | 2024-10-30T02:25:41.970687Z  INFO network: Closing connection to Some(ed25519:5pRiNCHbXFALoVqHJtqvUdJadT85r6L3GVSnyUeS3GJX@165.154.65.209:24567) err=Recv(IO(Kind(UnexpectedEof)))
srpc2-refiner  | 2024-10-30T02:25:42.817440Z  INFO indexer: Starting Streamer...
srpc2-refiner  | 2024-10-30T02:25:42.817440Z  INFO indexer: Starting Streamer...
srpc2-refiner  | 2024-10-30T02:25:42.824400Z  INFO actix_server::server: Actix runtime found; starting in Actix runtime
srpc2-refiner  | 2024-10-30T02:25:42.824412Z  INFO actix_server::server: starting service: "actix-web-service-0.0.0.0:3030", workers: 4, listening on: 0.0.0.0:3030
srpc2-refiner  | 2024-10-30T02:25:42.851503Z  INFO network: tier2 failed to connect to ed25519:CDQFcD9bHUWdc31rDfRi4ZrJczxg8derCzybcac142tK@35.196.209.192:24567 err="tcp::Stream::connect(): deadline has elapsed"
srpc2-refiner  | 2024-10-30T02:25:42.853444Z  INFO network: Failed to connect to ed25519:DUTxtvRwfBxs648W6TQjXH9dH6FhhvB9aVa8sLFyz1qX@34.90.162.215:24567 err="tcp::Stream::connect(): deadline has elapsed"
srpc2-refiner  | 2024-10-30T02:25:42.971576Z  INFO network: tier2 failed to connect to ed25519:86EtEy7epneKyrcJwSWP7zsisTkfDRH5CFVszt4qiQYw@35.195.32.249:24567 err="tcp::Stream::connect(): deadline has elapsed"
srpc2-refiner  | 2024-10-30T02:25:42.977517Z  INFO network: tier2 failed to connect to ed25519:BFB78VTDBBfCY4jCP99zWxhXUcFAZqR22oSx2KEr8UM1@35.229.222.235:24567 err="tcp::Stream::connect(): deadline has elapsed"
srpc2-refiner  | 2024-10-30T02:25:42.985454Z  INFO network: tier2 failed to connect to ed25519:Cw1YyiX9cybvz3yZcbYdG7oDV6D7Eihdfc8eM1e1KKoh@35.195.27.104:24567 err="tcp::Stream::connect(): deadline has elapsed"
srpc2-refiner  | 2024-10-30T02:25:43.000377Z  INFO network: tier2 failed to connect to ed25519:Cw1YyiX9cybvz3yZcbYdG7oDV6D7Eihdfc8eM1e1KKoh@35.195.27.104:24567 err="tcp::Stream::connect(): deadline has elapsed"
srpc2-refiner  | 2024-10-30T02:25:43.015848Z  INFO network: tier2 failed to connect to ed25519:Cw1YyiX9cybvz3yZcbYdG7oDV6D7Eihdfc8eM1e1KKoh@35.195.27.104:24567 err="tcp::Stream::connect(): deadline has elapsed"
srpc2-refiner  | 2024-10-30T02:25:43.364907Z  WARN telemetry: Failed to send telemetry data err=Connect(Io(Custom { kind: Other, error: "unexpected EOF" })) endpoint="https://explorer.mainnet.near.org/api/nodes"
srpc2-refiner  | 2024-10-30T02:25:44.197480Z  WARN aurora_standalone_engine::sync: Incorrect result in processing receipt_id=0x8d73fbce7b59ff28eba99d185f49c018fb87be7a341d3f435693b4cadcb38c49 computed differed from expected
srpc2-refiner  | 2024-10-30T02:25:44.197497Z  WARN aurora_standalone_engine::sync: Receipt 0x8d73fbce7b59ff28eba99d185f49c018fb87be7a341d3f435693b4cadcb38c49 diff mismatch with computed diff
srpc2-refiner  | 2024-10-30T02:25:44.198914Z  WARN aurora_refiner_lib::refiner_inner: Mismatch between Near and Engine outputs. The internal Engine instance may not have the correct state.
srpc2-refiner  | thread 'main' panicked at /usr/local/cargo/git/checkouts/aurora-engine-7e378a275ccc464a/338dd61/engine-precompiles/src/xcc.rs:262:16:
srpc2-refiner  | ERR_MISSING_WNEAR_ADDRESS
srpc2-refiner  | stack backtrace:
srpc2-refiner  |    0: rust_begin_unwind
srpc2-refiner  |    1: core::panicking::panic_fmt
srpc2-refiner  |    2: aurora_engine_precompiles::xcc::state::get_wnear_address
srpc2-refiner  |    3: aurora_engine::hashchain::with_logs_hashchain
srpc2-refiner  |    4: engine_standalone_storage::sync::execute_transaction
srpc2-refiner  |    5: engine_standalone_storage::Storage::with_engine_access
srpc2-refiner  |    6: engine_standalone_storage::sync::consume_message
srpc2-refiner  |    7: aurora_standalone_engine::sync::consume_near_block
srpc2-refiner  |    8: aurora_refiner::main::{{closure}}::{{closure}}
srpc2-refiner  |    9: <tokio::task::local::RunUntil<T> as core::future::future::Future>::poll
srpc2-refiner  |   10: tokio::runtime::scheduler::current_thread::Context::enter
srpc2-refiner  |   11: tokio::runtime::context::scoped::Scoped<T>::set
srpc2-refiner  |   12: aurora_refiner::main
srpc2-refiner  | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
srpc2-refiner  | 2024-10-30T02:25:45.036032Z  INFO chain: Sync block headers num_headers=512 first_hash=3ommaz2TgHHY5VMPASvK6XdmsJuWmCXc13n6HqnNwr7Z first_height=131501016 last_hash=CqajosocWheAJtZDcSdjwZ3AzHSnYiEoSpoYkyAwLkD1 last_height=131501527
srpc2-refiner  | thread 'actix-rt|system:0|arbiter:0' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/entry.rs:568:9:
srpc2-refiner  | A Tokio 1.x context was found, but it is being shutdown.
srpc2-refiner  | stack backtrace:
srpc2-refiner  |    0: rust_begin_unwind
srpc2-refiner  |    1: core::panicking::panic_fmt
srpc2-refiner  |    2: tokio::runtime::time::entry::TimerEntry::poll_elapsed::panic_cold_display
srpc2-refiner  |    3: tokio::runtime::time::entry::TimerEntry::poll_elapsed
srpc2-refiner  |    4: <tokio::time::sleep::Sleep as core::future::future::Future>::poll
srpc2-refiner  |    5: tokio::time::interval::Interval::poll_tick
srpc2-refiner  |    6: near_store::metrics::spawn_db_metrics_loop::{{closure}}
srpc2-refiner  |    7: tokio::runtime::task::core::Core<T,S>::poll
srpc2-refiner  |    8: tokio::runtime::task::harness::Harness<T,S>::poll
srpc2-refiner  |    9: tokio::task::local::LocalSet::tick
srpc2-refiner  |   10: <tokio::task::local::RunUntil<T> as core::future::future::Future>::poll
srpc2-refiner  |   11: tokio::task::local::LocalSet::run_until::{{closure}}
srpc2-refiner  |   12: tokio::runtime::scheduler::current_thread::Context::enter
srpc2-refiner  |   13: tokio::runtime::context::scoped::Scoped<T>::set
srpc2-refiner  |   14: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
srpc2-refiner  |   15: tokio::runtime::context::runtime::enter_runtime
srpc2-refiner  |   16: tokio::runtime::runtime::Runtime::block_on
srpc2-refiner  | note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
srpc2-refiner  | thread 'actix-rt|system:0|arbiter:1' panicked at /usr/local/cargo/registry/src/index.crates.io-6f17d22bba15001f/tokio-1.40.0/src/runtime/time/entry.rs:568:9:
aleksuss commented 1 week ago

@COLUD4, have you tried to invoke XCC calls? (EVM -> NEAR) ? If so, read here how to setup a contract for that. Brifly, you should have ERC-20 contract deployed via deploy_erc20_token transaction which corresponds wrap.near. After that, you need to deploy XCC factory via upgrade_factory transaction and set wNEAR ERC-20 address via factory_set_wnear_address.

YuXiaoCoder commented 1 week ago

Hey @COLUD4, thanks for bringing it up! I’m looking into a possible fix.

Hello, is there a good solution yet?

YuXiaoCoder commented 1 week ago

Hey @COLUD4, thanks for bringing it up! I’m looking into a possible fix.

@alexander-borodulya Hello, is there a good solution yet?

alexander-borodulya commented 1 week ago

@COLUD4 @YuXiaoCoder I apologize for the delay in reply

The error message ERR_MISSING_WNEAR_ADDRESS indicates that the refiner instance lacks certain state information that is necessary to process a transaction. More specifically, the error appeared because the refiner replays transactions locally and requires a complete history to function correctly.

A possible solution could be to run a refiner to process all historical blocks from where the Aurora Engine was first deployed. This complete processing will make the refiner build the required state and prevent such errors.

Unfortunately, a similar issue appeared some time ago, here are more details explained in this thread.