Xudong-Huang / may

rust stackful coroutine library
Apache License 2.0
1.93k stars 80 forks source link

A variety of changes #108

Closed pyprogrammer closed 1 month ago

pyprogrammer commented 6 months ago

Hi Xudong, I've been using May for some of our simulation research, and had a few changes I'd like to make to May that I think would be generically helpful. I'd also be happy to split these out into separate pull requests if it'd make it easier for you, but the overall changes are quite short.

  1. Made timeout a config option
  2. Made core pinning a config option
  3. Added spawn_builder to scopes
  4. Switched to using Crossbeam for work stealing
  5. Rewrote work-stealing scheduler to be cleaner. Also added randomized work-stealing instead of always checking the next few.
Xudong-Huang commented 6 months ago

thanks for the changes! I was using crossbeam for work stealing before, but found it some slow, so I use my own may_queue to do it. Have you ever benchmark the performance changes?

Also add random work stealing would impact the performance, but I didn't have the numbers.

Other changes looks good to me.

pyprogrammer commented 6 months ago

I'll revisit the perf impact between Crossbeam and may_queue -- I remember seeing small performance improvements but I'll rerun them and gather the info.

Re: Randomized work-stealing: the existing system only scans the next few queues before exiting the loop (which blocks at the next epoll). This can cause many workers to go to sleep if at any point in the program only a few coroutines are unblocked. The adjacent workers can steal, which in turn allows the worker before that to steal, etc. This in turn would cause a steal-chain to form. Randomized work stealing pays a small overhead for the RNG, but can better avoid such pathological conditions.

This could also be a candidate for potentially turning into a configuration or a feature flag; long-lived coroutines which frequently wait will likely prefer randomized work-stealing, while short-lived coroutines may prefer the lower overhead of the current system. I implemented randomized work-stealing because my applications were running into the pathological conditions mentioned above, and Go seems to have had success with a randomized work-stealer.

Xudong-Huang commented 6 months ago

I like the feature gate to do random work stealing. Also, let's keep using the may_queue. If you really need crossbeam for work stealing, I also like a feature gate for that.

pyprogrammer commented 5 months ago

Hi Xudong, just checking in to let you know that I'm still working on this. I've been swamped by camera-ready and artifact evaluation for the tool we built on top of May.

pyprogrammer commented 5 months ago

Sorry for the delay.

I've implemented the crossbeam-queue and randomized work-stealing as two separate features, and run the four feature flag combinations on my local machine. Based on these results, it appears that rand-steal vs. plain work-steal is generally a wash, while crossbeam-queue is worse than the current implementation.

Regarding crossbeam-queue, would you prefer merging it in gated behind the feature flag, or would you prefer removing it altogether?

crossbeam.txt crossbeam_and_rand.txt rand_steal.txt current.txt

pyprogrammer commented 1 month ago

@Xudong-Huang Hi Xudong, have you had time to take a look recently?

pyprogrammer commented 1 month ago

@Xudong-Huang Hi Xudong, checking in. Do you have time to take a look at the PR?

Xudong-Huang commented 1 month ago

thanks for this change