SyncHistogram::refresh() freezes

ghost commented 1 year ago

hdrhistogram = { version = "7.5.2", default-features = false, features = ["sync"] }

I'm getting recorders within a tokio::spawn closure:

time_per_one_recorder
    .get_or(|| {
        let time_per_one = time_per_one.lock().expect("time_per_one lock on updater");
        std::cell::RefCell::from(time_per_one.recorder())
    })
    .borrow_mut()
    .record(duration_ms)
    .expect("record to histogram");

When all tasks (1 million of them, i.e. that many record() calls) were processed and there's nothing else to record, I'm doing a refresh:

let mut time_per_one = time_per_one.lock().unwrap();
println!("=> Merging histograms...");
time_per_one.refresh();
println!("=> Merging done");

My program hangs forever on the refresh(), because the merging message is the last thing I'm seeing on stdout. According to top, my process doesn't do anything. I never tried debugging (as in gdb) for Rust programs so can't tell more at this point, will try it later.

By the way, everything used to be working fine whenever I called refresh() roughly once a second during the task processing. The freeze started happening when I decided one big merge when everything's done is better because it stops blocking recorders amid task processing.

Is this a crate bug, or maybe I'm using the crate wrong?

Thanks.

jonhoo commented 1 year ago

Ah, so, this should probably be documented better. refresh blocks until it has heard from every single recorder. But if the recorders aren't recording any more, it'll wait forever (and thus hang). You can use refresh_timeout here, or you can call Recorder::idle/Recorder::into_idle. If you get it working and have some time, I'd love a PR to improve the docs!

ghost commented 1 year ago

I'm sorry if I misunderstand how this works. I tried refresh_timeout and it timeouts, and it's like merge never happened because all values are 0s.

I store recorders in thread local, call record() on them. When all tasks are done, I call refresh on the histogram. Maybe that's an incorrect approach? I think maybe those threads, along with their threadlocal recorders, die before I have a chance to call refresh. Would that explain empty/nonworking merge?

jonhoo commented 1 year ago

Hmm.. Could you try augmenting the code where you call record so that it occasionally calls .idle()? I'm just curious to see if that helps.

vartiait2 commented 1 year ago

I tried refresh_timeout and it timeouts, and it's like merge never happened because all values are 0s.

Could that be a result of a time budget consumed?

https://github.com/HdrHistogram/HdrHistogram_rust/blob/main/src/sync/mod.rs#L332

HdrHistogram / HdrHistogram_rust

SyncHistogram::refresh() freezes #120