Fix stall detector to be correct threshold

DataDog / glommio

Glommio is a thread-per-core crate that makes writing highly parallel asynchronous applications in a thread-per-core architecture easier for rustaceans.

Other

3.12k stars 164 forks source link

Fix stall detector to be correct threshold #628

Closed tontinton closed 12 months ago

tontinton commented 1 year ago

The final expected runtime, is supposed to be whatever we get from the threshold call.

If we add the max_expected_time again, it will be wrong.

Whoever CRs this, I have a question: Do we want the threshold method to return the time to add to max_expected_time or do we want the user to completely control the amount, like I did in this PR?

I think it is better to let the user always control, he can decide whether to use max_expected_time or not.

glommer commented 1 year ago

max_expected_runtime is passed by the reactor, because the time may be different for different task queues.

The threshold() default implementation on line 65 seems to be the thing that's wrong, and it should be returning just 10ms.

tontinton commented 1 year ago

You don't think the stall detector implementer should decide whether he wants a stall detection to happen even sooner than max_expected_runtime? I see it as a legit use case.

If not, yeah I'll simply fix the default stall detector to return 10ms.

glommer commented 12 months ago

No, if the task queue has a quota to run for x ms, running for less than that is not a stall, technically speaking.

tontinton commented 12 months ago

Got it, fixed