0glabs / 0g-storage-node

Apache License 2.0
81 stars 81 forks source link

Data Corruption Error in v0.3.3 0g-Storage-node #110

Open Josephtran102 opened 3 months ago

Josephtran102 commented 3 months ago

Description

I encountered a data corruption error while running the node.

⚡ root@vmi1797354 ~/0g-storage-node/run ◈ v0.3.3 ±

../target/release/zgs_node --config config.toml

The error details are as follows:

thread 'tokio-runtime-worker' panicked at node/sync/src/auto_sync/tx_store.rs:103:9:
data corruption
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
   2: sync::auto_sync::tx_store::TxStore::remove
   3: sync::service::SyncService::on_sync_msg::{{closure}}
   4: futures_util::future::future::FutureExt::poll_unpin
   5: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
   6: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
   7: <futures_util::future::future::flatten::Flatten<Fut,<Fut as core::future::future::Future>::Output> as core::future::future::Future>::poll
   8: tokio::runtime::task::core::Core<T,S>::poll
   9: tokio::runtime::task::harness::Harness<T,S>::poll
  10: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  11: tokio::runtime::scheduler::multi_thread::worker::Context::run
  12: tokio::runtime::context::scoped::Scoped<T>::set
  13: tokio::runtime::context::runtime::enter_runtime
  14: tokio::runtime::scheduler::multi_thread::worker::run
  15: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  16: tokio::runtime::task::core::Core<T,S>::poll
  17: tokio::runtime::task::harness::Harness<T,S>::poll
  18: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.
Error: "Panic (fatal error)"

*Environment:

Operating System: Linux (Ubuntu 22.04.4 LTS)
Go version: go1.22.0 linux/amd64
Storage Node version: v0.3.3

Screen Shot 2024-07-05 at 11 52 55 Screen Shot 2024-07-05 at 11 54 58

papadritta commented 3 months ago

Got the same critical issue with the storage node version v0.3.3 that causes a task panic, resulting in a fatal error and data corruption. The application continuously restarts due to this panic.

Description:

Below are the relevant log details:

2024-07-08T07:29:00.552815Z  INFO zgs_node::client::environment: Internal shutdown received reason="Panic (fatal error)"
2024-07-08T07:29:00.552824Z  INFO zgs_node: Shutting down... reason=Failure("Panic (fatal error)")
2024-07-08T07:29:00.552839Z DEBUG task_executor: Async task shutdown, exit received task="log_reload"
2024-07-08T07:29:10.787034Z  INFO zgs_node: Starting services...
2024-07-08T04:43:28.351045Z ERROR task_executor: Task panic. This is a bug! task_name="sync" data corruption advice="Please check above for a backtrace and notify the developers"
2024-07-08T04:43:28.351069Z  INFO zgs_node::client::environment: Internal shutdown received reason="Panic (fatal error)"
2024-07-08T04:43:28.351078Z  INFO zgs_node: Shutting down... reason=Failure("Panic (fatal error)")
2024-07-08T04:43:28.351094Z DEBUG task_executor: Async task shutdown, exit received task="log_reload"
2024-07-08T04:43:38.537647Z  INFO zgs_node: Starting services...
thread 'tokio-runtime-worker' panicked at node/sync/src/auto_sync/tx_store.rs:103:9:
data corruption
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
   2: sync::auto_sync::tx_store::TxStore::remove
   3: sync::service::SyncService::on_sync_msg::{{closure}}
   4: futures_util::future::future::FutureExt::poll_unpin
   5: <futures_util::future::select::Select<A,B> as core::future::future::Future>::poll
   6: <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll
   7: <futures_util::future::future::flatten::Flatten<Fut,<Fut as core::future::future::Future>::Output> as core::future::future::Future>::poll
   8: tokio::runtime::task::core::Core<T,S>::poll
   9: tokio::runtime::task::harness::Harness<T,S>::poll
  10: tokio::runtime::scheduler::multi_thread::worker::Context::run_task
  11: tokio::runtime::scheduler::multi_thread::worker::Context::run
  12: tokio::runtime::context::scoped::Scoped<T>::set
  13: tokio::runtime::context::runtime::enter_runtime
  14: tokio::runtime::scheduler::multi_thread::worker::run
  15: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
  16: tokio::runtime::task::core::Core<T,S>::poll
  17: tokio::runtime::task::harness::Harness<T,S>::poll
  18: tokio::runtime::blocking::pool::Inner::run
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Error: "Panic (fatal error)"
zgs.service: Main process exited, code=exited, status=1/FAILURE
zgs.service: Failed with result 'exit-code'.
zgs.service: Consumed 3.025s CPU time.

System Information:

System Load:

papadritta commented 3 months ago

I have rebuilt the node and issue was sorted out, and the error’s gone.