[2.3.0] Cache put gets stuck with high sys time

drcrallen commented 8 years ago

Puts are getting stuck for us at

        at java.lang.Thread.yield(Native Method)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.afterWrite(BoundedLocalCache.java:914)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.putFast(BoundedLocalCache.java:1515)
        at com.github.benmanes.caffeine.cache.BoundedLocalCache.put(BoundedLocalCache.java:1453)
        at com.github.benmanes.caffeine.cache.LocalManualCache.put(LocalManualCache.java:64)

Sys time (and user time) spikes really high under this condition. averaging 35% and 60% of total node cpu time respectively.

drcrallen commented 8 years ago

There is a gap between the eviction bytes and when the problem appears.

Here's the eviction throughput in bytes reported by caffeine (weight == bytes)

Here's the sys time:

Here's the number of requests per minute going into the cache:

You can see the eviction weight reporting stops well before the number of processed requests stops. And the processed requests stop at the same time the SYS cpu kicks into high gear.

ben-manes commented 8 years ago

Okay, this is very frustrating then.

Your usage is extremely write heavy (given 500k evictions per hour). The write rate exceeds the buffer's capacity and so it must wait for the maintenance task to catch up. That buffer is 128 * NCPUs, e.g. on an 8 core machine its 1024. The maintenance task should be scheduled after a write, so that the write buffer stays near empty. When full the writers spin by failing to append, trying to schedule, and eventually yielding before retrying. The yield is to avoid starvation, assuming that no CPU time is being given to the maintenance thread which performs the work. In 2.2.6 this write buffer was unbounded so the concern was starvation causing it to never catch up. The current behavior is similar to a file system's async io.

What I'm confused by is whether this means there is a scheduling problem. In the case of 500k evictions per hour, that is 138 per second. That's a very high rate, but there's no reason why a thread shouldn't be able to dequeue that if running. On a stress test I see 2.55M/s sustained.

drcrallen commented 8 years ago

During peak, cache hit rate is about 20%, so 80% of the queries end up writing new data into the cache that will probably just get evicted within an hour.

ben-manes commented 8 years ago

Okay, then this must be that scheduling is halted. That's the only explanation that would cause no progress to be made whatsoever. Otherwise we'd see a steady stream of evictions while the writers spin.

Can you run a background thread that calls Cache#cleanUp periodically? That will force a drain and ignores the state machine. That must be left in a state claiming that scheduling is in progress but somehow isn't.

ben-manes commented 8 years ago

This might also be due to ForkJoinPool.

When I configure Caffeine's executor to Executors.newWorkStealingPool(1) then scheduling halts. Yet if I use 2 or commonPool then nothing is wrong. It seems like a parallelism of 1 causes the pool to be in a bad state. Perhaps somehow that's happening?

ben-manes commented 8 years ago

I think FJP might be lossy under high contention. I switch from the state machine to a Semaphore(1) so that the lock is transferred to the maintenance thread. I then printed out whenever the permit is acquired and released. This results in,

...
Scheduled (scheduleDrainBuffers)
Released (PerformCleanupTask)
Scheduled (scheduleDrainBuffers)
-- halted

The task is scheduled into the executor but is never run. No exceptions are thrown (and logged). Since the state machine is disabled, the only explanation is that the executor discarded the task and we're left in a bad state.

We have to talk to Doug about this. You could switch to a ThreadPoolExecutor (e.g. Executors.newSingleThreadExecutor) which doesn't show this problem.

ben-manes commented 8 years ago

You can run this yourself if you'd like to see the weird behavior.

fjp_stress branch

./gradlew stress

I can't reproduce yet on an isolated test, so I can't rule out Caffeine's code. Yet if I change executors no problem exists and nothing else makes sense so far.

ben-manes commented 8 years ago

@drcrallen What JDK version are you running?

If I run using oracle64-1.8.0.51 the failure above occurs. The latest is oracle64-1.8.0.92 which does not exhibit this problem.

drcrallen commented 8 years ago

java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)

ben-manes commented 8 years ago

Thanks. I emailed Doug to ask if he's aware of any fixes in FJP that might explain this. The Oracle release notes were not useful and I need to dig more into his CVS repository and JDK bug tracker in hopes to confirming this.

Can you upgrade JDKs? The other option is to use a different executor, like Runnable::run to process it on a calling thread.

ben-manes commented 8 years ago

Likely candidate: Missed submissions in ForkJoinPool

drcrallen commented 8 years ago

@ben-manes we're a little bit looser with JDK version enforcement than I'd like to admit, so while I'm happy to upgrade the JDK version, I think the solution for the druid-cache plugin for caffeine is going to be setting a single-threaded executor so that the behavior is more consistent among 1.8 releases. For now I've simply disabled local caching through caffeine (so nodes go directly to memcached instead). I'll get the fix in early next week and report on this thread and the other when I have results.

drcrallen commented 8 years ago

Thanks for looking into this!

ben-manes commented 8 years ago

Thanks for not getting too upset at me and reporting the issues :-)

I'm in a quagmire myself of how to best handle this. I'd very much prefer not losing the commonPool() optimization, but this is also a very nasty failure.

ben-manes commented 8 years ago

Doug confirmed,

Yes (sorry). Thanks for figuring this out without me having to
recheck with your tests!

-Doug

drcrallen commented 8 years ago

@ben-manes what is the commonPool optimization you are talking about?

ben-manes commented 8 years ago

I meant that its effectively free due to,

delegate the all maintenance onto a background thread instead of the caller (Guava, CLHM)
does not require a dedicated thread
shared pool so the threads are free
FJP uses a ring buffer work queue (less object allocation)

But performance has to be good even if penalizing the caller (higher response times). CLHM did this very well, but integration into Guava led to performance losses so it is less efficient.

There's nothing to be concerned about if you use a dedicated thread or direct executor. Its unfortunate that FJP had a bug that makes it error prone.

drcrallen commented 8 years ago

@ben-manes thanks for the explanation.

drcrallen commented 8 years ago

As a general update this is currently baking. Should have results sometime next week.

drcrallen commented 8 years ago

This seems to be working fine with a single threaded executor.

ben-manes commented 8 years ago

Great! Thanks for the load testing and sticking though it :-)

ben-manes / caffeine

[2.3.0] Cache put gets stuck with high sys time #77