Open vulogov opened 7 months ago
It looks like the OS returned Resource Temporarily Unavailable (error 35)
while spawning new threads for the async
executor. Is the issue above always happening on your machine or was it a sporadic failure?
Would you mind providing more info on your system/hardware configuration in addition to the kernel version?
Did you have many other applications running in parallel?
async
runtime and is expected to land relatively soon.1) I am using a sync client to publish/subscribe messages on Zenoh
2) Handling messages and establish connections/push/close happens in pre-spawned threads from me code. But there is only one feeder. Although I do accept that the sync calls could stuck and all pre-spawned threads may become wasted :)
3) Hardware is a Mac mini, with Intel architecture. Nothing extraordinary I can say about apps. Standard Darwin stack, Cocoa, few terminal sessions, virtualbox with one vm, atom editor
4) For me, switching to async in my project will require a significant code change on my end.
Thanks for the up-to-date information. Response below:
1) Ok, nothing weird here.
2) How many threads are you spawning?
3) Nothing extremely weird going on.
4) Moving to async/sync
will not change the much the end result unless you are spawning too many threads.
Can you please monitor CPU/RAM consumption on that? From the actual panic message really seems that you are running out of resources in the system...
Here is common situation with CPU/Memory and other utilization
I am running my app with extended backtrace. I will have some results in an hour or two.
[2024-02-03T00:18:06Z INFO zbus::cmd::zbus_export_stream] Request is served by thread #0
[2024-02-03T00:18:06Z INFO zenoh::net::runtime] Using PID: 706b64c5d3a618ca4ea57962e9558259
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::aede:48ff:fe00:1122]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1c66:20b3:ef39:9909]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::f484:9eff:fed9:8232]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::f484:9eff:fed9:8232]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1d93:51a3:319e:1aa5]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::832e:42c4:32ca:940e]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::ce81:b1c:bd2c:69e]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::68ce:1db7:bb03:abc]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::abaa:328c:3fc6:2b79]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/192.168.86.228:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] zenohd listening scout messages on 224.0.0.224:7446
[2024-02-03T00:18:07Z INFO zbus::cmd::zbus_export_stream] Initiate delivery to bus
[2024-02-03T00:18:07Z INFO zbus::cmd::zbus_export_stream] ZBX catcher->ZBUS: zbus/metric/v1/local/Zabbix server/vm.memory.size/total
[2024-02-03T00:18:07Z INFO zbus::cmd::zbus_export_stream] Closing connection to bus
[2024-02-03T00:18:07Z INFO zbus::cmd::zbus_export_stream] Request is served by thread #0
[2024-02-03T00:18:07Z INFO zenoh::net::runtime] Using PID: 706b64c5d3a618ca4ea57962e9558259
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 35, kind: WouldBlock, message: "Resource temporarily unavailable" }', /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:335:18
stack backtrace:
0: _rust_begin_unwind
1: core::panicking::panic_fmt
2: core::result::unwrap_failed
3: core::result::Result<T,E>::unwrap
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/core/src/result.rs:1076:23
4: zenoh_transport::manager::TransportExecutor::new
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:332:13
5: zenoh_transport::manager::TransportManager::new
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:383:26
6: zenoh_transport::manager::TransportManagerBuilder::build
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-transport-0.10.1-rc/src/manager.rs:291:12
7: zenoh::net::runtime::Runtime::init::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-0.10.1-rc/src/net/runtime/mod.rs:128:33
8: zenoh::session::Session::new::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-0.10.1-rc/src/session.rs:849:41
9: <async_std::task::builder::SupportTaskLocals<F> as core::future::future::Future>::poll::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:199:17
10: async_std::task::task_locals_wrapper::TaskLocalsWrapper::set_current::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/task_locals_wrapper.rs:60:13
11: std::thread::local::LocalKey<T>::try_with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
12: std::thread::local::LocalKey<T>::with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
13: async_std::task::task_locals_wrapper::TaskLocalsWrapper::set_current
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/task_locals_wrapper.rs:55:9
14: <async_std::task::builder::SupportTaskLocals<F> as core::future::future::Future>::poll
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:197:13
15: <futures_lite::future::Or<F1,F2> as core::future::future::Future>::poll
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/futures-lite-2.2.0/src/future.rs:446:33
16: async_executor::Executor::run::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-executor-1.8.0/src/lib.rs:261:32
17: async_executor::LocalExecutor::run::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-executor-1.8.0/src/lib.rs:480:34
18: async_io::driver::block_on::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-io-2.3.1/src/driver.rs:199:37
19: std::thread::local::LocalKey<T>::try_with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
20: std::thread::local::LocalKey<T>::with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
21: async_io::driver::block_on
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-io-2.3.1/src/driver.rs:175:5
22: async_global_executor::reactor::block_on::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.4.1/src/reactor.rs:3:18
23: async_global_executor::reactor::block_on
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.4.1/src/reactor.rs:12:5
24: async_global_executor::executor::block_on::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.4.1/src/executor.rs:26:36
25: std::thread::local::LocalKey<T>::try_with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
26: std::thread::local::LocalKey<T>::with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
27: async_global_executor::executor::block_on
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-global-executor-2.4.1/src/executor.rs:26:5
28: async_std::task::builder::Builder::blocking::{{closure}}::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:171:25
29: async_std::task::task_locals_wrapper::TaskLocalsWrapper::set_current::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/task_locals_wrapper.rs:60:13
30: std::thread::local::LocalKey<T>::try_with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
31: std::thread::local::LocalKey<T>::with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
32: async_std::task::task_locals_wrapper::TaskLocalsWrapper::set_current
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/task_locals_wrapper.rs:55:9
33: async_std::task::builder::Builder::blocking::{{closure}}
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:168:17
34: std::thread::local::LocalKey<T>::try_with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:270:16
35: std::thread::local::LocalKey<T>::with
at /private/tmp/rust-20230923-7199-1y02uxq/rustc-1.72.1-src/library/std/src/thread/local.rs:246:9
36: async_std::task::builder::Builder::blocking
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/builder.rs:161:9
37: async_std::task::block_on::block_on
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/async-std-1.12.0/src/task/block_on.rs:33:5
38: <zenoh_core::ResolveFuture<F,To> as zenoh_core::SyncResolve>::res_sync
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-core-0.10.1-rc/src/lib.rs:166:9
39: <zenoh::OpenBuilder<TryIntoConfig> as zenoh_core::SyncResolve>::res_sync
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-0.10.1-rc/src/lib.rs:305:9
40: zenoh_core::SyncResolve::res
at /Users/gandalf/.cargo/registry/src/index.crates.io-6f17d22bba15001f/zenoh-core-0.10.1-rc/src/lib.rs:57:9
41: zbus::cmd::zbus_export_stream::run_history
at ./src/cmd/zbus_export_stream.rs:28:11
42: zbus::cmd::zbus_export_stream::run::{{closure}}
at ./src/cmd/zbus_export_stream.rs:107:57
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2024-02-03T00:18:07Z ERROR zbus::cmd::zbus_export_stream] Zabbix catcher error in joining the thread: Any { .. }
Each pre-spaned thread performing this:
If I am to exhaust any limit, wanna know which one. And zbus_export_stream.rs:28 is at zenoh.open().res()
[2024-02-03T00:18:06Z INFO zbus::cmd::zbus_export_stream] Request is served by thread #0
[2024-02-03T00:18:06Z INFO zenoh::net::runtime] Using PID: 706b64c5d3a618ca4ea57962e9558259
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::aede:48ff:fe00:1122]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1c66:20b3:ef39:9909]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::f484:9eff:fed9:8232]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::f484:9eff:fed9:8232]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::1d93:51a3:319e:1aa5]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::832e:42c4:32ca:940e]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::ce81:b1c:bd2c:69e]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::68ce:1db7:bb03:abc]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/[fe80::abaa:328c:3fc6:2b79]:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] Zenoh can be reached at: tcp/192.168.86.228:64017
[2024-02-03T00:18:06Z INFO zenoh::net::runtime::orchestrator] zenohd listening scout messages on 224.0.0.224:7446
[2024-02-03T00:18:07Z INFO zbus::cmd::zbus_export_stream] Initiate delivery to bus
[2024-02-03T00:18:07Z INFO zbus::cmd::zbus_export_stream] ZBX catcher->ZBUS: zbus/metric/v1/local/Zabbix server/vm.memory.size/total
[2024-02-03T00:18:07Z INFO zbus::cmd::zbus_export_stream] Closing connection to bus
I am running my app with extended backtrace. I will have some results in an hour or two.
Based on the screenshot it seems your system is overloaded. I can see ~10000
and only 35MB
of free RAM out of 32GB
.
Sure, I exhaust the RAM, but I do have swapins and swapouts going. Even if that, shall app crash in library code if something like this occurs ?
Error management could always be improved. But in this particular case Zenoh
needs to spawn some threads in order to communicate on the network.
Assuming the error is somehow bubbled up, how would your application react? I'm trying to understand the potential usage pattern here...
One of my points, while I do exhaust the RAM, I do not exhaust VM, so as far as I concern, I do have resources. Although I will eventually get a more in-depth understanding of the mechanics of the Zenoh library, my general expectation is that if any problem happens inside the library call, the one shall return either a result or an error, which I shall handle. And if unwrap() in the library throws unhandled exception, there is not something I can do on my side of the code.
Describe the bug
Application is panicked in zenoh code with
To reproduce
System info