caio / foca

mirror of https://caio.co/de/foca/
https://caio.co/de/foca/
Mozilla Public License 2.0
147 stars 8 forks source link

Implement a simple and more readable event scheduler #36

Open Manphil opened 2 months ago

Manphil commented 2 months ago

The new event scheduler uses FuturesUnordered to schedule the timer event, the code is simpler and easy to read.

caio commented 2 months ago

Sweet! Thanks for the contribution!

I agree it's more readable. Not that it's simpler: now we have to know how FuturesUnordered work and trust a dependency :) I'm willing to be convinced that it's an improvement overall, but I've got questions.

I'm not familiar with FuturesUnordered, but glancing at the docs I'd assume that the benefits of using it vs simply tokio::task::spawn a wait_for_event future (what the original example code used to do) is:

  1. You get control over the futures, so easy cancellation comes back into play; And
  2. You somewhat limit the impact that handling events has on the runtime (by reducing the number of futures that need to be polled by the main scheduler)

If my assumptions are correct, I believe this would be a regression and bring back the possibility of events arriving out of order.

If you haven't stumbled upon it already, check this issue where one of my tripwires caught a "events out of order" problem - It's a bit confusing to follow as we're debugging and conjecturing asynchronously but in a gist this is what happened: A combination of large cluster size + misconfiguration + using tasks for handling events + live restarts led to a large number of futures that needed to be polled; The runtime lagged for whatever reason and then foca detected the events out of order.

I think that this new code would lead to the same problem, just a little harder to trigger. Whereas with the existing code there's no way events will come out of order (the TimerQueue forces ordering) and it's lighter on the runtime: the number of futures doesn't grow based on the number of events.

Again, this is all conjecture since I'm not familiar with FuturesUnordered. What do you think?

Manphil commented 2 months ago

I think the FuturesUnordered behaves like TimerQueue, the earliest completed future will be return first by FuturesUnordered::next(). But indeed, FuturesUnordered is more complicated, we are not familiar with it, using TimerQueue is better.

The runtime lagged for whatever reason and then foca detected the events out of order.

I think when the runtime is heavy load, some future tasks may be polled slowly when tokio::task::spawn a wait_for_event future. So the events may out of order. FuturesUnordered is polled one by one, so the futures inside it is return in the order of completion. So the events will not out of order? It's my guess.

Anyway, keeping TimerQueue is a better choice.

caio commented 2 months ago

Agreed!

I do think that making the example shorter and easier to read is an excellent goal so I'll welcome any attempts at achieving that. Maybe renaming things and getting rid of super lazy macro_rules thing there helps?

Previously, the example had its own implementation of a Runtime which I moved into foca. I'm open for doing something similar with TimerQueue, but it's a very specialized thing and I fear that making it flexible and with a nice api will force us to sacrifice readability

I think when the runtime is heavy load, some future tasks may be polled slowly when tokio::task::spawn a wait_for_event future. So the events may out of order. FuturesUnordered is polled one by one, so the futures inside it is return in the order of completion. So the events will not out of order? It's my guess.

I took another quick look at tokio::time and FuturesUnordered: My understanding is that FuturesUnordered would indeed yield futures in order of completion. But since all we're doing is hook on tokio::time::Sleep for the delay, it's tokio's timer which will decide which future actually completes first - that's where things can go bad.

I couldn't find docs that specify/guarantees ordering of wake-after-sleep events so it wouldn't be a good idea to rely on Runtime-specific behaviour; I would guess that for efficiency's sake the multi-threaded runtime has a timer per worker thread so if one sleep is scheduled on thread 0 and another on thread 1 there's already a chance of them being fired out of order.

Manphil commented 2 months ago

I do think that making the example shorter and easier to read is an excellent goal so I'll welcome any attempts at achieving that. Maybe renaming things and getting rid of super lazy macro_rules thing there helps?

Maybe replacing the marco with a function like submit_event(Option<Instant>, Timer) makes it more readable. And removing the label 'handler is better.

caio commented 2 months ago

Yup, thanks a lot for the input! If you don't want to do it (or don't have time), I'll take a stab at it within the next few weeks before I bake v0.18

caio commented 1 month ago

I've finally got around to cleaning up the scheduler code. I've simply extracted its async loop into a method and got rid of the ugly macro:

https://github.com/caio/foca/blob/main/examples/foca_insecure_udp_agent.rs#L447 (commit https://github.com/caio/foca/commit/2256fb70cd416ef38da79a717c453bc9303fd044)

Wdyt? Feel free to send comments / questions and or update this pr :) I intend of releasing 0.18 next weekend