brianwatling / libfiber

A User Space Threading Library Supporting Multi-Core Systems
MIT License
139 stars 32 forks source link

[event-priority] Do not starve fibers blocked on events #5

Open ndragazis opened 4 years ago

ndragazis commented 4 years ago

Hello,

I have been experimenting with this library recently and I have bumped into the following issue:

Currently, both the maintenance fiber and fiber_manager_yield() do not poll for events if there are fibers in the scheduler's queue. This means that fibers blocked on events can get starved if there are always fibers ready to be run.

brianwatling commented 4 years ago

Hi Nikos,

Glad you found this library interesting!

The issue you point out is definitely valid. I think there's arguments both ways - ideally fibers should be cooperative and eventually reach an idle state, but practically speaking there's use cases where some fibers always have work to do and should therefore yield periodically to others (including checking IO readiness).

Years ago I re-wrote this library in C++ and with better algorithms. I solved this issue by having two queues, a primary and a secondary. Anytime a fiber yields it would pull from the primary queue and put itself into the secondary queue. If the primary queue is empty, then events are polled (with timeout = 0 since the yielding fiber is still ready to run) and the queues are swapped so the fibers which yielded get another chance to run.

That rewrite isn't public for various reasons, but I'd happily accept a pull request to this repo which implements something similar.

That re-write also used split stacks - the same thing Golang used (uses?) - which was kind of cool!

I am somewhat curious if you plan to use this in a production capacity. While the IO shimming is neat and generally works well, I don't think I can recommend using it with libraries you don't control or know very well. For example, back in 2011/2012 when I was writing this library I experimented with shimming network calls to IBM's DB2 database - IIRC it actually did kind of work, but only after fixing crazy crashes and honestly I wouldn't have trusted it. IIRC the biggest problem is thread local variables that you don't control.

I suggest looking at folly::Fiber (https://github.com/facebook/folly/tree/master/folly/fibers) which has a lot of features and works well (runs or at least ran major workloads that I know of). I contributed some stuff to it too in the past :)

Brian

ndragazis commented 4 years ago

Hi Brian,

Thanks for all the pointers/suggestions! Some comments inline:

That rewrite isn't public for various reasons, but I'd happily accept a pull request to this repo which implements something similar.

It would be my pleasure to contribute.

I am somewhat curious if you plan to use this in a production capacity. While the IO shimming is neat and generally works well, I don't think I can recommend using it with libraries you don't control or know very well.

Well, we were exploring the possibility of migrating from threads to fibers in our application. I bumped into your library while searching for open source userspace threading libraries (great work btw). I tested the context switch latency with a simple test program and the results look really promising. So, I'd like to give it a try. My experience is quite limited in this area and, as you say, I will possibly hit on many bugs. I will certainly keep you in the loop of my findings!

Nikos