Open joshua-g opened 10 years ago
Ideally, region.mutate wouldn't block at all, but would have a mechanism to be notified when the mutation is complete, so that a callback can be run on the fiber.
Failing that, a simple-ish fix for this in terms of components we already have is for ReplicatedTablet to submit its blocking IO tasks to some other ExecutorService. Perhaps TabletService can have one of its own, or share with its creator?
(What I mean by "share" is, C5DB has an Executors.newFixedThreadPool that it uses to run fiber tasks. I don't think it would be a correctness problem to run IO tasks on the same thread pool, as long as none of the fiber tasks block -- which we are mandating they shouldn't. On the other hand, I think it makes sense for the fiber thread pool to be a "fast lane", and to keep slower tasks on their own pool).
My intention was originally to use a thread pool for these kind of io tasks and have fibers for the rest of the code. =
In ReplicatedTablet.java, tabletStateCallback is run on a fiber. It then makes calls to region.mutate, which are blocking. For instance, they ultimately need to call sync() against the underlying replication algorithm.
(Blocking on a fiber is often more a performance problem than a correctness problem. But there's a caveat: if one fiber blocks, waiting for another fiber to execute some task; and if that second fiber is running on the same underlying thread (or if it is the same actual fiber), then the task on the second fiber will never get to run, which means the first fiber will block forever.)