ben-manes / caffeine

A high performance caching library for Java
Apache License 2.0
15.81k stars 1.59k forks source link

GC overhead #915

Closed JohannesLichtenberger closed 1 year ago

JohannesLichtenberger commented 1 year ago

Hi Ben,

I'm using Caffeine as a simplified page cache / buffer manager for my data store (SirixDB). I'm currently getting a lot of GC pressure. I've enabled ZGC for testing, but my test heap space is min/max 5g, so maybe rather small. GC pauses are ridiculously high, as you may see in the following picture:

Screenshot from 2023-04-19 17-08-45

I'm unsure if an off-heap cache may be better, but G1 pauses were about 100 - 150ms at max, so I'm not sure what's going on. The image is from a traversal of the whole internal representation of a bigger binary JSON file. The cache, in this case, may simply be overhead, as only one trx traverses the database pages. The most recent page is already "cached" in-memory, as well as page references to child pages in the trie are set once they are fetched from durable storage (and setPage(null) is set once they are removed from the page cache). However, there's a lot of GC pressure... and the GC pauses with ZGC really are somehow strange. Maybe it's also a manifestation of a memory leak?

The setup of the page cache is as follows:

  public PageCache(final int maxSize) {
    RemovalListener<PageReference, Page> removalListener =
        (PageReference key, Page value, RemovalCause cause) -> {
      key.setPage(null);
    };

    pageCache = Caffeine.newBuilder()
                        .maximumSize(maxSize)
                        .expireAfterWrite(5, TimeUnit.SECONDS)
                        .expireAfterAccess(5, TimeUnit.SECONDS)
                        .scheduler(scheduler)
                        .removalListener(removalListener)
                        .build();
  }
JohannesLichtenberger commented 1 year ago

It seems it is a race condition between setting the in-memory page references and thus keeping the pages in memory and the eviction. It may be that it's reading the pages faster than they are evicted from the caches. With an empty cache instead of Caffeine, the size quickly almost goes up to the max heap size. However, even with one-second eviction, there's a zigzag, suggesting that I should probably evict even more frequently. That said, in the following picture, the maximum size of the cache is set to 1, which is still strange:

Screenshot from 2023-04-19 17-41-08

ben-manes commented 1 year ago

Expiration would mean it discards popular entries based on how recently it was used. That would cause more garbage as the cache evicts and misses more often, leading to loads which may also generate garbage. A few seconds is quite aggressive. Is expiration needed and what would the profile look like if disabled? I see this used in many places with short intervals.

You shouldn't set both expireAfterAccess and expireAfterWrite, since access includes writes. I'd like to disallow that combination but it was allowed in Guava not hard to maintain, even if likely a mistake to have allowed in either. This won't add much extra overhead, mostly an unnecessary per-entry long field.

The cache size may matter if set to low. I see in many places it is a decent size, but I did not verify all of them. It is helpful to export metrics so that you can monitor hit rates, so I'd recommend instrumenting your usages (e.g. with micrometer).

You could use JFR to record a profile, where JMC will provide tips based on the profile data. I also like to use JProfiler and Yourkit, which are free for open-source projects. The JDK version can also have a big impact so checking the latest is helpful, e.g. if you are using humongous objects than that has been redesigned a few times in G1 and the GC may require some tuning as an abnormal workload type. You'll need to capture more data as right now this is not enough to pinpoint a problem and when tackling these things your initial guess will almost always be wrong so these tools really help.

ben-manes commented 1 year ago

You might also look at maximumWeight if your objects are large, as you might be setting a low expiration time as a workaround. The maximum size is the number of entries, but the space they consume might vary and be much larger than expected. The short duration then makes those eligible for GC right away, but that means more churn. If the age doesn't matter otherwise then setting the capacity based on the entry size could lead to more stable behavior.

JohannesLichtenberger commented 1 year ago

I think in this case the cache is mainly overhead, as I'm setting in-memory page references, right after fetching the pages from disk. The cache, in this case, is only used to set the in-memory reference to null at eviction time (some kind of a poor man's pointer "swizzling", but as it's not possible in Java I'm setting the in-memory instances (via PageReference::setPage and removing the reference during eviction).

The page cache will be more useful, in other tests, when more trxs are open, but in this particular case it's more or less overhead or simply used to set the in-memory reference to null ;-)

I'm using Java 19 (GraalVM Community Edition currently) and updated to the latest Caffeine version now.

I set the heap size (max and min to 8Gb now) and updated the configurations and lowered the maximum sizes considerably. I've temporarily removed the timer-based eviction, but I have a rather simple problem. I have different caches for each "resource" in a database. Thus, more and more caches would be created over time. The cleanest solution would probably be a key, which also takes the resource name into account, but I'm not sure if this would slow down getting the values. First, SirixDB tries to get the in-memory page reference via PageReference::getPage, and only if it's null the global Caffeine caches are used. As I'm thinking about it I guess PageReference::setPage and PageReference::getPage have to be on a volatile instance (currently it's private Page page, but I guess it has to be volatile...).

BTW: With YourKit I can't make a snapshot for the particular test as it freezes my laptop, sadly...

JohannesLichtenberger commented 1 year ago

Do you think an off-heap cache would even make sense in this case? Maybe not, as the serialization/deserialization costs per page might be rather high...

ben-manes commented 1 year ago

You can open a jfr recording in yourkit and jprofiler, so it is quite nice having different views of the same data as tools are a bit opinionated in what they think matters.

I believe the min/max trick is no longer recommended GC settings. It made sense in older collectors and to ensure that the system had enough memory to run the program while being watched, rather than fail randomly later on. It also delayed how often GCs would run, which could lead to longer but less frequent pauses as those policies waited until space was not available. Modern GCs will return memory to the OS (making it useful for its buffer page cache) and run GCs more proactively to avoid allocating more resources.

If possible you should load through the cache to avoid a stampede.

I'm not a fan of off-heap in most cases if you have to serialize the data. It is perfect if you are caching raw bytes anyway so can serve that without marshalling overheads. For non-infra projects it is better pair with a remote cache like memcached, so off-heap is good only in cases like yours for a db. You could try a library like ohc. I was eager to see Apache Mnemonic evolve to avoid the serialization overhead, but I haven't heard much about it for a long time. Then you could have the values off-heap, the cache management on-heap, and avoid the serialization tax. But it is all hard to judge, so one has to experiment and measure.

JohannesLichtenberger commented 1 year ago

Well, I'm not sure what's the problem when using ZGC, even after changing some cache params:

Screenshot from 2023-04-19 23-18-15

Screenshot from 2023-04-19 23-23-50

The objects allocated are almost all within cached pages. I'm not sure why ZGC performs much fewer minor GCs, but a couple of GCs, which are long (shouldn't they mostly stay below 10ms?).

With the default GC G1, I'm getting much more garbage collections, but primarily within 50 - 80ms, IIRC. Thus, it's still (much) faster somehow. I wonder if that might be a problem with ZGC itself instead of my code... (90 - 100s vs. ~170s), but without a JMH benchmark, but with repeated runs. The behavior seems odd.

ben-manes commented 1 year ago

ZGC is not yet generational (JEP-439). You might try their EA builds to see if that changes the profile.