DataDog / glommio

Glommio is a thread-per-core crate that makes writing highly parallel asynchronous applications in a thread-per-core architecture easier for rustaceans.
Other
3.06k stars 162 forks source link

Fix first Reactor sleep taking at least 30-50ms #653

Closed vlovich closed 5 months ago

vlovich commented 5 months ago

Noticed this problem in unit tests where the very first timer sleep would take a very long time because the membarrier was being registered and tests have more than 1 thread.

Initializing the membarrier strategy on Reactor::new seems like a good idea because it's highly likely any use of the Reactor will involve going to sleep and I can't imagine a use-case where that's not the case unless you're not interacting with anything within Glommio in the first place.

With this change we see that the very first timer now completes within 11ms (I added a grace window of an extra ms in case of CI).

What does this PR do?

The first construction of the IO uring reactor now registers the membarrier.

Motivation

I had tests that were flakily failing because a very short sleep was taking an almost unbounded amount of time (observed up to 60ms even for a 5ms sleep). I think it's cleaner if the membarrier initialization is front-loaded into Reactor construction vs paying that penalty on the very first .await later where it's a bit more non-obvious and surprising.

Related issues

Fixes #652

Additional Notes

I believe the cost of registration when there's > 1 thread running got worse sometime after Linux 6.6 because tests in my project that do something similar started regularly taking >30ms after upgrading to 6.8 whereas before they were mostly succeeding < 30ms (it was my grace window for how long a 10ms sleep could take).

Checklist

[X] I have added unit tests to the code I am submitting [X] My unit tests cover both failure and success scenarios [] If applicable, I have discussed my architecture

vlovich commented 5 months ago

Ping on this in case you missed it in your review batch @glommer

glommer commented 5 months ago

I had missed it, indeed.

glommer commented 5 months ago

Can we add a comment on the code explaining why we're doing this, so nobody else removes that in the future?

vlovich commented 5 months ago

Good note. I've added much more extensive documentation for the subtleties. I also realized that it's placement was incorrect as it needed to preceed the construction of threads in BlockingThreadPool. I also realized it's necessary as an external API (or we can add a dependency on the ctor crate to do this transparently) for 2 reasons:

  1. The user may start their own threads before glommio
  2. The user may sandbox their app before glommio & thus glommio will get a privilege denial for trying to register the membarrier

I don't know the policy on adding dependency on 3p crates so I went with an explicit API that users can call if it's needed for them (most don't) but ctor magic would be more user-friendly although portability may be a concern (I think it's well supported on tier 1 platforms but I'm sure there's always subtleties that can appear with that kind of low-level stuff). If you'd prefer magic, I'm happy to change the PR & add a dependency on ctor. Let me know.

glommer commented 5 months ago

As I get older, I dislike magic more and more.