Panick in unwrap timer-0.2.0/src/lib.rs:290:9

jjpepper commented 4 years ago

I've received a stack trace for a crash I can't reproduce myself. The stacktrace starts out mentioning line 290 in lib.rs of your timer crate. Is this a known issue? The unwrap error mentions kind WouldBlock message "Resource temporarily unavailable". Any suggestions of things I can try?

Looking at the code, line 290 at the time of the 0.2.0 release seems to be the following:

        // Spawn a second thread, in charge of scheduling.
        thread::Builder::new().name("Timer thread".to_owned()).spawn(move || {
            let mut scheduler = Scheduler::with_capacity(waiter_recv, executor, capacity);
            scheduler.run()
        }).unwrap();

The documentation here for std::thread Builder implies this might mean that a thread couldn't be created. Is that how you read it?

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /home/u/.cargo/registry/src/github.com-1ecc6299db9ec823/timer-0.2.0/src/lib.rs:290:9
stack backtrace:
   0:     0x5565c5289e64 - backtrace::backtrace::libunwind::trace::ha716b483344307d4
                               at /usr/src/rustc-1.43.0/vendor/backtrace/src/backtrace/libunwind.rs:86
   1:     0x5565c5289e64 - backtrace::backtrace::trace_unsynchronized::h73191aaca03de050
                               at /usr/src/rustc-1.43.0/vendor/backtrace/src/backtrace/mod.rs:66
   2:     0x5565c5289e64 - std::sys_common::backtrace::_print_fmt::h515542a6f0149e5c
                               at src/libstd/sys_common/backtrace.rs:78
   3:     0x5565c5289e64 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hbd3babec5fee3351
                               at src/libstd/sys_common/backtrace.rs:59
   4:     0x5565c52a311c - core::fmt::write::ha5eb378c8b683563
                               at src/libcore/fmt/mod.rs:1063
   5:     0x5565c5289ba7 - std::io::Write::write_fmt::h7525b2825ed7383d
                               at src/libstd/io/mod.rs:1426
   6:     0x5565c5281465 - std::sys_common::backtrace::_print::hfec26b72b492a55c
                               at src/libstd/sys_common/backtrace.rs:62
   7:     0x5565c5281465 - std::sys_common::backtrace::print::h05570acc7a135366
                               at src/libstd/sys_common/backtrace.rs:49
   8:     0x5565c5281465 - std::panicking::default_hook::{{closure}}::h9278410915688d73

jjpepper commented 4 years ago

I've analysed this a bit further... I didn't realise that each call to schedule_with_delay will spawn a thread. I'd say the documentation should warn about that too. Wouldn't it be better to have a container of scheduled timers to run, ordered by the wall-clock time they will run at, and a single executer thread sleeps until it's time to execute the first in the container, then wakes up, executes that one and removes it from the container, then sleeps until the new first one in the container?

The panic I got was actually because it couldn't spawn a thread and therefore the unwrap failed. I'm scheduling a lot of timers, and found my process had 30k threads.

0xpr03 commented 4 years ago

Could you try out the current master ? There are some fixes pending #15 for release and are related to #14. So your debugging could be outdated.

0xpr03 commented 4 years ago

[dependencies]
timer = { git = "https://github.com/Yoric/timer.rs", branch = "master" }

jjpepper commented 4 years ago

I just tried with your suggestion. Saw it download from github etc. I'm still getting the same behaviour. I'm dropping a lot of guards, but just the idea of spawning a thread for every call to schedule_with_delay seems wrong to me. Are you expecting the code on master to not spawn a thread per call to schedule_with_delay?

0xpr03 commented 4 years ago

Hi, no I'm not expecting that, I just wanted to suggest trying out the new code regarding your panics to get stacktraces for the current code version.

Edit just realized that it's thread creation related. Yeah this is totally due to the process going out of max allowed threads.

Are you creating a Timer per job you're scheduling ? Or do you have one Timer instance and call schedule_with_delay repeatedly ?

jjpepper commented 4 years ago

Oh good point. I have essentially one timer per job. It might get 100 calls to schedule_with_delay but for each call the previous guard is dropped. Should I have a single timer?

jjpepper commented 4 years ago

You were right, I was creating many timers, which I didn't need to do. Fixed now. Runs faster and...

Yoric commented 4 years ago

I seem to be a bit late to the party but:

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', /home/u/.cargo/registry/src/github.com-1ecc6299db9ec823/timer-0.2.0/src/lib.rs:290:9

when spawning a thread indeed means that the process already has too many threads.

You were right, I was creating many timers, which I didn't need to do. Fixed now. Runs faster and...

Yes, while nothing prevents you from spawning many timers, the idea is to have only a single timer per application. Do you think we should alter the documentation to make this clearer?

jjpepper commented 4 years ago

Yeah, it might be good to suggest that's the intention, but in hindsight it would probably be a bit hard to make multiple timers share a thread. Perhaps also pointing out that calls to schedule_with_delay can come from multiple threads simultaneously. I assume that is the case right?

Yoric commented 4 years ago

They can, but they will be executed on the Scheduler's thread.

I wonder if an API based on Future would somehow be useful and/or clarify things.

0xpr03 commented 4 years ago

@Yoric what do you mean with "based on Future" ? Because I wouldn't try to replicate https://docs.rs/tokio/0.3.0/tokio/time/index.html also we'd need a future reactor (externally or internally)

@jjpepper multiple timers can be be the right solution, based on how fine grained your different tasks are. For example if one of the tasks (a) would run at a very high tick rate and another task (b) would take at least one of a's tick-time to execute, I'd not run them on the same timer.

tehsunnliu commented 1 year ago

Hi, I'm facing the same problem within intel SGX Enclaves. Since I'm very limited to the number of threads I can run under enclave, this error happens more often even if I'm using a single Timer instance.

0xpr03 commented 1 year ago

very limited to the number of threads

But at that point this crate is probably not for you. It relies on a scheduler and a runner thread. Even if this crate wouldn't unwrap and instead error out, you probably haven't won anything and simply can't run your code? If you can, I'd recommend trying out the tokio engine. It may be async, but it only needs a fixed amount of threads to run all the timers you give it. You can certainly use async/await to utilize tokio as some kind of realtime system for your timers. (I wouldn't worry too much about blocking for now, if if you have > 1 thread for tokio to use.)

tehsunnliu commented 1 year ago

Hello, Thank you for your response. I was able to fix the problem by increasing the number of threads for Enclaves. By default, the enclave was assigned 10 threads and consumed all 10 threads. The first call to Timer initialization increases the number of threads to 12 preventing any other thread from being spawned. Hence calling schedule_with_* fails to unwrap throwing SGX_ERROR_OUT_OF_TCS error.

I'm using Sgx SDK which supports only no_std environment for writing enclaves and currently, tokio crate doesn't support no_std. So, adding no_std to this library was less work. And its working fine now. Thank you once again.

Yoric / timer.rs

Panick in unwrap timer-0.2.0/src/lib.rs:290:9 #16