Open ndragazis opened 4 years ago
Hi Nikos,
Agreed the current implementation is not ideal for all cases. It's a tradeoff game - when there are many fibers it makes sense to steal a larger batch (stealing is not cheap, at least compared to popping from the local queue), but stealing too many when there are few fibers doesn't balance the work well. Another issue is that even maintaining a global count of runnable fibers slows down the scheduler.
In a real world workload the overhead of stealing or global counts probably won't be as visible though. Also, fibers could inspect the size of the source queue before stealing relatively cheaply.
In the C++ rewrite I mentioned in https://github.com/brianwatling/libfiber/issues/5#issuecomment-604776866 I used 4 for the number of fibers to steal. I still didn't implement a smarter heuristic for how many fibers to steal.
I'd be happy to accept a pull request to tweak the number of fibers stolen (even 1 would probably work well enough if the fibers are doing any non trivial amount of work), make it configurable, or add other smarts to do better.
Brian
The current implementation of load balancing in the fifo scheduler is that it steals up to 16 fibers from the other schedulers. The fact that it uses the fixed value of 16 instead of taking into account the current overall load could lead one scheduler take all the load from another scheduler.