A couple of the Fault Tolerance 1.1 tests very occasionally fail due to a known race condition which occurs when we submit async tasks to a bulkhead faster than liberty begins the execution of those tasks. With a parallelism of 10 and a queue size of 10, we expect to be able to submit 20 tasks. However it takes a non-zero amount of time for liberty to remove submitted tasks from the queue and start executing them. In rare cases, adding the last one fails because the queue is full.
The TCK certainly expects to be able to submit 20 tasks, the spec wording isn't totally clear either way.
Gordon also pointed out that, particularly as we start exposing metrics for Bulkheads, it would be good if the waiting task queue didn't fluctuate while the bulkhead isn't full, so that the queue size could be a useful measure of how contended the resource behind the bulkhead is.
Unfortunately, this is at odds with the current implementation of PolicyExecutorService, so we'd have to do a bit of work on our side to work around that by allocating a bigger queue and keeping track of how many things are executing.
We addressed this in 2.0 by using a different approach for bulkhead. I don't think we should go back and rewrite Fault Tolerance 1.x to change the behaviour there.
A couple of the Fault Tolerance 1.1 tests very occasionally fail due to a known race condition which occurs when we submit async tasks to a bulkhead faster than liberty begins the execution of those tasks. With a parallelism of 10 and a queue size of 10, we expect to be able to submit 20 tasks. However it takes a non-zero amount of time for liberty to remove submitted tasks from the queue and start executing them. In rare cases, adding the last one fails because the queue is full.
The TCK certainly expects to be able to submit 20 tasks, the spec wording isn't totally clear either way.
Gordon also pointed out that, particularly as we start exposing metrics for Bulkheads, it would be good if the waiting task queue didn't fluctuate while the bulkhead isn't full, so that the queue size could be a useful measure of how contended the resource behind the bulkhead is.
Unfortunately, this is at odds with the current implementation of PolicyExecutorService, so we'd have to do a bit of work on our side to work around that by allocating a bigger queue and keeping track of how many things are executing.