Open ggPeti opened 10 years ago
OMG I'm so sorry because of all @houndci's trolling :(
This is as awesome as huge :). I'll try to have a look at the implementation tonight, but deadlock detection is definitely an improvement over what we had before.
Out of curiosity: Were you having a lot of deadlock situations? Could you give me an example? I actually haven't used futuroscope much, least in production environments.
Yes we have ran into deadlocks! It was the motivation for this PR. We use futures all around, and sometimes a situation emerges that is similar to the example under the headline Prioritized futures, just with more threads and futures in the pool. Basically it's a situation where a future that was pushed early pushes something at the end of the queue, which also happens to be a common dependency for a lot of other futures coming later in the queue. The concrete example:
There is a search provider module which takes phone numbers, email addresses etc. It runs all searches in futures, but also caches them: if later we run the search with a phone number that was ran earlier, it will just give this future back. If a lot of other futures run searches agains the same phone number, the actual search can be stuck outside the pool's reach, locking down everything, as the ones inside are waiting for something that won't be allocated a worker before they themselves get resolved.
Hope this makes sense.
Oh and I also noticed that the build fails on rbx, jruby and 1.9.3 because there's no Thread.handle_interrupt
on them :( I can fix 2.0.0 however, by replacing to_h
with Hash[]
. What do you think the best course of action is? I realize that rbx and jruby are the only versions without a GIL, so it may be awkward for futuroscope to not support those. On the other hand, for IO heavy use cases (such as ours) it really shines on MRI too.
@ggPeti this is.... amazing. I say we drop support for 1.9.3 and only support 2+, the latest Rbx (compatible with 2+), and in the future JRuby 9000. Mind changing the CI settings to make your PR pass? I'll merge as soon as it's green. Thank you SO much for all these features! :)
Thanks for reviewing it! I will get back on this in the following days. Rbx didn't have asyncronous thread messaging implemented at the time of my PR, I will check on it to see if they implemented it since. If not, I will see if I can implement it myself. This PR makes heavy use of handle_interrupt
so we definitely need it.
Apologies for the giant PR, but one thing led to another, and it didn't make sense to implement one without another (well, logging could have been separated, but it was very useful for debugging while implementing the rest).
The features implemented are:
Deadlock detection
There are 2 kinds of deadlocks that we can detect.
1. Circular dependency
When futures depend on each other in a circle, Futuroscope will detect that and send each thread involved a DeadlockError with a message describing the situation. For example:
2. Pool size too low
When the pool is full, but all futures are waiting for another future that doesn't have a worker yet, Futuroscope will fail a future which does not depend on any other future (and the one with the least priority out of those). (Note: This selection might be optimized to selecting the root of the smallest unary branch of the dependency forest instead of the root of the smallest tree, but this is a good enough solution for a situation that rarely occurs.) For example:
Prioritized futures
Instead of a queue, Futuroscope now uses something resembling a priority queue, where the priority is determined by how many threads are directly or indirectly blocking on the future's value. This can avoid deadlock situations, for example:
In the old implementation, this would raise a fatal error, because
f1
starts to block onf4
before it gets to the queue, thenf3
starts to block onf4
as well afterf2
is done, so the pool gets full of blocking futures beforef4
has a chance to be evaluated.Smart up/down spinning of workers
Previously, with the push of every new future, there was a 50% chance that a new worker gets spun up if the limit hasn't been reached yet. Now the workers keep track whether they are free or busy, and the pool only spins up a new worker if it has more futures without workers than free workers.
Also previously every thread immediately quit if it was over the minimum thread count. Now, if the pool has more workers than the minimum, the workers have a 2 second linger period in which they will pick up new work if it's available, and only if no new work came in will they die. This is to minimize the cost of spinning threads up and down when the work comes in close spikes.
This also leads to better parallelization in some cases. Old version:
New version:
Logging
Futuroscope now supports logging. If you put loggers into
Futuroscope.loggers
(which is a simple array), they will get all log messages from inside. The loggers are expected to conform with Ruby's coreLogger
, in that they respond to these messages:debug
,info
,warn
,error
,fatal
.Example:
I'd also like to note that this PR comes with multiple engineering decisions that are worth discussing. For example, I've moved away from using a
Queue
in both the pool and the future; instead, I'm now making use ofConditionVariable
s (whichQueue
is using internally as well).Another one is that the pool keeps track of the
__id__
s of the futures as hash keys. Why is that so, why am I using 2 hashes instead of one: one withfuture.__id__
s pointing to priority values, and one withfuture.__id__
s pointing to the actual futures? Because if you want to use aDelegator
as a hash key, Ruby will call#hash
on the object, which gets forwarded to the wrapped object, creating a deadlock. Not forwarding#hash
is no solution: it will make the futures non-transparent,hsh[:key]
will not be the same ashsh[future { :key }]
.Please discuss any parts that you disagree with, I'm happy to elaborate and of course it's always possible that I missed something.