Closed drcrallen closed 8 years ago
There is a gap between the eviction bytes and when the problem appears.
Here's the eviction throughput in bytes reported by caffeine (weight == bytes)
Here's the sys time:
Here's the number of requests per minute going into the cache:
You can see the eviction weight reporting stops well before the number of processed requests stops. And the processed requests stop at the same time the SYS cpu kicks into high gear.
Okay, this is very frustrating then.
Your usage is extremely write heavy (given 500k evictions per hour). The write rate exceeds the buffer's capacity and so it must wait for the maintenance task to catch up. That buffer is 128 * NCPUs, e.g. on an 8 core machine its 1024. The maintenance task should be scheduled after a write, so that the write buffer stays near empty. When full the writers spin by failing to append, trying to schedule, and eventually yielding before retrying. The yield is to avoid starvation, assuming that no CPU time is being given to the maintenance thread which performs the work. In 2.2.6 this write buffer was unbounded so the concern was starvation causing it to never catch up. The current behavior is similar to a file system's async io.
What I'm confused by is whether this means there is a scheduling problem. In the case of 500k evictions per hour, that is 138 per second. That's a very high rate, but there's no reason why a thread shouldn't be able to dequeue that if running. On a stress test I see 2.55M/s sustained.
During peak, cache hit rate is about 20%, so 80% of the queries end up writing new data into the cache that will probably just get evicted within an hour.
Okay, then this must be that scheduling is halted. That's the only explanation that would cause no progress to be made whatsoever. Otherwise we'd see a steady stream of evictions while the writers spin.
Can you run a background thread that calls Cache#cleanUp
periodically? That will force a drain and ignores the state machine. That must be left in a state claiming that scheduling is in progress but somehow isn't.
This might also be due to ForkJoinPool
.
When I configure Caffeine's executor to Executors.newWorkStealingPool(1)
then scheduling halts. Yet if I use 2
or commonPool
then nothing is wrong. It seems like a parallelism of 1
causes the pool to be in a bad state. Perhaps somehow that's happening?
I think FJP might be lossy under high contention. I switch from the state machine to a Semaphore(1) so that the lock is transferred to the maintenance thread. I then printed out whenever the permit is acquired and released. This results in,
...
Scheduled (scheduleDrainBuffers)
Released (PerformCleanupTask)
Scheduled (scheduleDrainBuffers)
-- halted
The task is scheduled into the executor but is never run. No exceptions are thrown (and logged). Since the state machine is disabled, the only explanation is that the executor discarded the task and we're left in a bad state.
We have to talk to Doug about this. You could switch to a ThreadPoolExecutor (e.g. Executors.newSingleThreadExecutor
) which doesn't show this problem.
You can run this yourself if you'd like to see the weird behavior.
fjp_stress branch
./gradlew stress
I can't reproduce yet on an isolated test, so I can't rule out Caffeine's code. Yet if I change executors no problem exists and nothing else makes sense so far.
@drcrallen What JDK version are you running?
If I run using oracle64-1.8.0.51
the failure above occurs. The latest is oracle64-1.8.0.92
which does not exhibit this problem.
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
Thanks. I emailed Doug to ask if he's aware of any fixes in FJP that might explain this. The Oracle release notes were not useful and I need to dig more into his CVS repository and JDK bug tracker in hopes to confirming this.
Can you upgrade JDKs? The other option is to use a different executor, like Runnable::run
to process it on a calling thread.
Likely candidate: Missed submissions in ForkJoinPool
@ben-manes we're a little bit looser with JDK version enforcement than I'd like to admit, so while I'm happy to upgrade the JDK version, I think the solution for the druid-cache plugin for caffeine is going to be setting a single-threaded executor so that the behavior is more consistent among 1.8 releases. For now I've simply disabled local caching through caffeine (so nodes go directly to memcached instead). I'll get the fix in early next week and report on this thread and the other when I have results.
Thanks for looking into this!
Thanks for not getting too upset at me and reporting the issues :-)
I'm in a quagmire myself of how to best handle this. I'd very much prefer not losing the commonPool() optimization, but this is also a very nasty failure.
Doug confirmed,
Yes (sorry). Thanks for figuring this out without me having to
recheck with your tests!
-Doug
@ben-manes what is the commonPool
optimization you are talking about?
I meant that its effectively free due to,
But performance has to be good even if penalizing the caller (higher response times). CLHM did this very well, but integration into Guava led to performance losses so it is less efficient.
There's nothing to be concerned about if you use a dedicated thread or direct executor. Its unfortunate that FJP had a bug that makes it error prone.
@ben-manes thanks for the explanation.
As a general update this is currently baking. Should have results sometime next week.
This seems to be working fine with a single threaded executor.
Great! Thanks for the load testing and sticking though it :-)
Puts are getting stuck for us at
Sys time (and user time) spikes really high under this condition. averaging 35% and 60% of total node cpu time respectively.