eclipse-zenoh / zenoh

zenoh unifies data in motion, data in-use, data at rest and computations. It carefully blends traditional pub/sub with geo-distributed storages, queries and computations, while retaining a level of time and space efficiency that is well beyond any of the mainstream stacks.
https://zenoh.io
Other
1.42k stars 147 forks source link

Application panicked in zenoh code: registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:335:18 #701

Open vulogov opened 7 months ago

vulogov commented 7 months ago

Describe the bug

Application is panicked in zenoh code with

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" }', /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:335:18

To reproduce

  1. Zenohd started as
zenohd --adminspace-permissions=rw --cfg='plugins/storage_manager/storages/demo:{key_expr:"zbus/**",volume:"memory"}'
  1. Publisher
fn run_history(content: String, c: cmd::Cli, stream_cmd: cmd::Stream, zc: Config) {
    let stream = Deserializer::from_str(&content).into_iter::<Value>();

    match zenoh::open(zc).res() {
        Ok(session) => {
            for value in stream {
                match value {
                    Ok(zjson) => {
                        if stream_cmd.stdout {
                            match serde_json::to_string_pretty(&zjson) {
                                Ok(val) => {
                                    println!("{}", &val);
                                }
                                Err(err) => {
                                    log::error!("Error converting JSON for stdout: {:?}", err);
                                }
                            }
                        }
                        if stream_cmd.bus {
                            match cmd::zenoh_lib::get_key_from_metadata(c.platform_name.clone(), "*".to_string(), zjson["itemid"].to_string(), &session) {
                                Some(key) => {
                                    match cmd::zbus_export_zabbix::convert_zabbix_export_payload_to_zbus(key, c.platform_name.clone(), zjson.clone()) {
                                        Some(payload) => {
                                            let store_key = match zjson["type"].as_i64() {
                                                Some(2) => format!("log/metric/{}/{}/{}{}", &c.protocol_version, &c.platform_name, &payload["src"].as_str().unwrap(), &payload["key"].as_str().unwrap()),
                                                _ => format!("zbus/metric/{}/{}/{}{}", &c.protocol_version, &c.platform_name, &payload["src"].as_str().unwrap(), &payload["key"].as_str().unwrap()),
                                            };
                                            match session.put(store_key.clone(), payload.clone()).encoding(KnownEncoding::AppJson).res() {
                                                Ok(_) => log::info!("ZBX catcher->ZBUS: {}", &store_key),
                                                Err(err) => log::error!("Error ingesting {} {:?}: {:?}", &payload["key"], &payload, err),
                                            }
                                        }
                                        None => continue,
                                    }
                                }
                                None => continue,
                            }
                        }
                    }
                    Err(err) => log::error!("zabbix streamer catch an error during JSON processing: {:?}", err),
                }
            }
            match session.close().res() {
                Ok(_) => {},
                Err(err) => log::error!("Error closing Zenoh session: {:?}", err),
            }
        }
        Err(err) => {
            log::error!("Error connecting to the bus: {:?}", err);
        }
    }
}

System info

Mallets commented 7 months ago

It looks like the OS returned Resource Temporarily Unavailable (error 35) while spawning new threads for the async executor. Is the issue above always happening on your machine or was it a sporadic failure? Would you mind providing more info on your system/hardware configuration in addition to the kernel version? Did you have many other applications running in parallel?

566 is currently reworking the async runtime and is expected to land relatively soon.

vulogov commented 7 months ago

1) I am using a sync client to publish/subscribe messages on Zenoh

2) Handling messages and establish connections/push/close happens in pre-spawned threads from me code. But there is only one feeder. Although I do accept that the sync calls could stuck and all pre-spawned threads may become wasted :)

3) Hardware is a Mac mini, with Intel architecture. Nothing extraordinary I can say about apps. Standard Darwin stack, Cocoa, few terminal sessions, virtualbox with one vm, atom editor

4) For me, switching to async in my project will require a significant code change on my end.

Mallets commented 7 months ago

Thanks for the up-to-date information. Response below:

1) Ok, nothing weird here.

2) How many threads are you spawning?

3) Nothing extremely weird going on.

4) Moving to async/sync will not change the much the end result unless you are spawning too many threads.

Can you please monitor CPU/RAM consumption on that? From the actual panic message really seems that you are running out of resources in the system...

vulogov commented 7 months ago
  1. I've tried to spawn 1,2,3,4,10 http threads and result is the same. There is a single thread on sender that shall consume a single http thread.
  2. Agree
  3. Thank you for clarification
vulogov commented 7 months ago

Here is common situation with CPU/Memory and other utilization

Screenshot 2024-02-02 at 13 45 12

vulogov commented 7 months ago

I am running my app with extended backtrace. I will have some results in an hour or two.

vulogov commented 7 months ago
[2024-02-03T00:18:06Z INFO  zbus::cmd::zbus_export_stream] Request is served by thread #0
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime] Using PID: 706b64c5d3a618ca4ea57962e9558259
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::aede:48ff:fe00:1122]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1c66:20b3:ef39:9909]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::f484:9eff:fed9:8232]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::f484:9eff:fed9:8232]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1d93:51a3:319e:1aa5]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::832e:42c4:32ca:940e]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::ce81:b1c:bd2c:69e]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::68ce:1db7:bb03:abc]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::abaa:328c:3fc6:2b79]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/192.168.86.228:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] zenohd listening scout messages on 224.0.0.224:7446
[2024-02-03T00:18:07Z INFO  zbus::cmd::zbus_export_stream] Initiate delivery to bus
[2024-02-03T00:18:07Z INFO  zbus::cmd::zbus_export_stream] ZBX catcher->ZBUS: zbus/metric/v1/local/Zabbix server/vm.memory.size/total
[2024-02-03T00:18:07Z INFO  zbus::cmd::zbus_export_stream] Closing connection to bus
[2024-02-03T00:18:07Z INFO  zbus::cmd::zbus_export_stream] Request is served by thread #0
[2024-02-03T00:18:07Z INFO  zenoh::net::runtime] Using PID: 706b64c5d3a618ca4ea57962e9558259
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" }', /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:335:18
stack backtrace:
   0: _rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::result::unwrap_failed
   3: core::result::Result<T,E>::unwrap
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/core/src/result.rs:1076:23
   4: zenoh_transport::manager::TransportExecutor::new
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:332:13
   5: zenoh_transport::manager::TransportManager::new
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:383:26
   6: zenoh_transport::manager::TransportManagerBuilder::build
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:291:12
   7: zenoh::net::runtime::Runtime::init::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-0.10.1-rc/src/net/runtime/mod.rs:128:33
   8: zenoh::session::Session::new::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-0.10.1-rc/src/session.rs:849:41
   9: <async_std::task::builder::SupportTaskLocals<F> as core::future::future::Future>::poll::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:199:17
  10: async_std::task::task_locals_wrapper::TaskLocalsWrapper::set_current::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/task_locals_wrapper.rs:60:13
  11: std::thread::local::LocalKey<T>::try_with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
  12: std::thread::local::LocalKey<T>::with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
  13: async_std::task::task_locals_wrapper::TaskLocalsWrapper::set_current
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/task_locals_wrapper.rs:55:9
  14: <async_std::task::builder::SupportTaskLocals<F> as core::future::future::Future>::poll
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:197:13
  15: <futures_lite::future::Or<F1,F2> as core::future::future::Future>::poll
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-lite-2.2.0/src/future.rs:446:33
  16: async_executor::Executor::run::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-executor-1.8.0/src/lib.rs:261:32
  17: async_executor::LocalExecutor::run::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-executor-1.8.0/src/lib.rs:480:34
  18: async_io::driver::block_on::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-io-2.3.1/src/driver.rs:199:37
  19: std::thread::local::LocalKey<T>::try_with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
  20: std::thread::local::LocalKey<T>::with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
  21: async_io::driver::block_on
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-io-2.3.1/src/driver.rs:175:5
  22: async_global_executor::reactor::block_on::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.4.1/src/reactor.rs:3:18
  23: async_global_executor::reactor::block_on
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.4.1/src/reactor.rs:12:5
  24: async_global_executor::executor::block_on::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.4.1/src/executor.rs:26:36
  25: std::thread::local::LocalKey<T>::try_with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
  26: std::thread::local::LocalKey<T>::with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
  27: async_global_executor::executor::block_on
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.4.1/src/executor.rs:26:5
  28: async_std::task::builder::Builder::blocking::{{closure}}::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:171:25
  29: async_std::task::task_locals_wrapper::TaskLocalsWrapper::set_current::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/task_locals_wrapper.rs:60:13
  30: std::thread::local::LocalKey<T>::try_with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
  31: std::thread::local::LocalKey<T>::with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
  32: async_std::task::task_locals_wrapper::TaskLocalsWrapper::set_current
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/task_locals_wrapper.rs:55:9
  33: async_std::task::builder::Builder::blocking::{{closure}}
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:168:17
  34: std::thread::local::LocalKey<T>::try_with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
  35: std::thread::local::LocalKey<T>::with
             at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
  36: async_std::task::builder::Builder::blocking
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:161:9
  37: async_std::task::block_on::block_on
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/block_on.rs:33:5
  38: <zenoh_core::ResolveFuture<F,To> as zenoh_core::SyncResolve>::res_sync
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-core-0.10.1-rc/src/lib.rs:166:9
  39: <zenoh::OpenBuilder<TryIntoConfig> as zenoh_core::SyncResolve>::res_sync
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-0.10.1-rc/src/lib.rs:305:9
  40: zenoh_core::SyncResolve::res
             at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-core-0.10.1-rc/src/lib.rs:57:9
  41: zbus::cmd::zbus_export_stream::run_history
             at ./src/cmd/zbus_export_stream.rs:28:11
  42: zbus::cmd::zbus_export_stream::run::{{closure}}
             at ./src/cmd/zbus_export_stream.rs:107:57
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2024-02-03T00:18:07Z ERROR zbus::cmd::zbus_export_stream] Zabbix catcher error in joining the thread: Any { .. }
vulogov commented 7 months ago

Each pre-spaned thread performing this:

  1. Establish connection with Zenoh
  2. Parse received JSON's
  3. Publish succesfully parsed JSON
  4. Close connection with Zenoh

If I am to exhaust any limit, wanna know which one. And zbus_export_stream.rs:28 is at zenoh.open().res()

[2024-02-03T00:18:06Z INFO  zbus::cmd::zbus_export_stream] Request is served by thread #0
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime] Using PID: 706b64c5d3a618ca4ea57962e9558259
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::aede:48ff:fe00:1122]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1c66:20b3:ef39:9909]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::f484:9eff:fed9:8232]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::f484:9eff:fed9:8232]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1d93:51a3:319e:1aa5]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::832e:42c4:32ca:940e]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::ce81:b1c:bd2c:69e]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::68ce:1db7:bb03:abc]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::abaa:328c:3fc6:2b79]:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/192.168.86.228:64017
[2024-02-03T00:18:06Z INFO  zenoh::net::runtime::orchestrator] zenohd listening scout messages on 224.0.0.224:7446
[2024-02-03T00:18:07Z INFO  zbus::cmd::zbus_export_stream] Initiate delivery to bus
[2024-02-03T00:18:07Z INFO  zbus::cmd::zbus_export_stream] ZBX catcher->ZBUS: zbus/metric/v1/local/Zabbix server/vm.memory.size/total
[2024-02-03T00:18:07Z INFO  zbus::cmd::zbus_export_stream] Closing connection to bus
Mallets commented 7 months ago

I am running my app with extended backtrace. I will have some results in an hour or two.

Based on the screenshot it seems your system is overloaded. I can see ~10000 and only 35MB of free RAM out of 32GB.

vulogov commented 7 months ago

Sure, I exhaust the RAM, but I do have swapins and swapouts going. Even if that, shall app crash in library code if something like this occurs ?

Mallets commented 7 months ago

Error management could always be improved. But in this particular case Zenoh needs to spawn some threads in order to communicate on the network.

Assuming the error is somehow bubbled up, how would your application react? I'm trying to understand the potential usage pattern here...

vulogov commented 7 months ago

One of my points, while I do exhaust the RAM, I do not exhaust VM, so as far as I concern, I do have resources. Although I will eventually get a more in-depth understanding of the mechanics of the Zenoh library, my general expectation is that if any problem happens inside the library call, the one shall return either a result or an error, which I shall handle. And if unwrap() in the library throws unhandled exception, there is not something I can do on my side of the code.