cache.put is first to be removed

krm1312 commented 8 years ago

This may not be a bug, but, it is a difference between Guava and Caffeine. Basically, the most recent entry put into a cache appears to sometimes be the first one expired. Our actual use case is put from thread A, get from thread B, where thread B cannot actually build the object. If you run below, caffeine fails for a few in each run whereas guava does not.

import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;

import com.github.benmanes.caffeine.cache.Caffeine;
import com.github.benmanes.caffeine.cache.LoadingCache;

public class CaffeineTest {

    private static final int ITERATIONS = 10000;
    private static final int CACHE_SIZE = 1000;

    public static void main(String args[]) {
        testCaffeine();
        testGuava();
    }

    static void testCaffeine() {
        LoadingCache<Long, Object> cache = Caffeine.newBuilder()
                .maximumSize(CACHE_SIZE).build((k) -> null);
        for (long i=0; i<ITERATIONS; i++) {
            cache.put(i, i);
            if (null == cache.get(i)) {
                System.err.println("Caffeine: " + i + " was null");
            }
        }
    }

    static void testGuava() {
        com.google.common.cache.LoadingCache<Long, Object> cache = CacheBuilder.newBuilder()
                .maximumSize(CACHE_SIZE).build(new CacheLoader<Long, Object>() {
                    @Override
                    public Object load(Long key) throws Exception {
                        return null;
                    }
                });

        for (long i=0; i<ITERATIONS; i++) {
            cache.put(i, i);
            if (null == cache.getUnchecked(i)) {
                System.err.println("Guava: " + i + " was null");
            }
        }
    }

}

ben-manes commented 8 years ago

Thanks for the report. A few differences to consider.

Guava uses the LRU eviction policy, meaning that it discards the oldest entry in the iteration. Caffeine uses W-TinyLFU, which tries to discard newly admitted entries that have a low probability of reuse. This "frequency filter" helps avoid the cache from being polluted, which causes LRU to under perform. Please see the efficiency wiki page for a short overview for why this policy was chosen.

Secondly a test should use Caffeine.executor(Runnable::run) to avoid any asynchronous behavior. When that is used the test passes as the entry is in the window. If its not present then when the eviction catches up it seems to cause a few null observations. I agree its not intuitive and I'm a little surprised as well, so I'll dig in a little to see why the concurrency changed the results.

Third is that Guava disallows null values from the cache loader and would throw an exception. Caffeine allows them indicating not present and does not retain null as a value. That fits with Map#computeIfAbsent, whereas Guava disallowed via a checked exception per Josh Bloch's recommendation when reviewing the API.

In the migration page I did try to document these differences.

The admission window is 1% so I agree its seems odd, as the most recent should be retained even if some asynchronous behavior is there. So not sure immediately why that makes a difference.

ben-manes commented 8 years ago

The race condition appears to be caused by the write buffer being resized. The write buffer is initialized to 4 and grows to a maximum prior to throttling writes. When I increase the initial buffer size, e.g. to 10k, then your test reports no errors.

I guess resizing results in the order being LIFO instead of FIFO. Technically the buffer has no order guarantee, but I had thought it was FIFO given how it links array chunks.

For your use-case you should consider some scratchpad separate from the cache. The cache holds transient data so you can't rely on data being present. Instead a secondary map could be used to hold in-progress entries so they are not discarded until demoted to normal cache content.

krm1312 commented 8 years ago

Thank you for the explanation. How we are using it is indeed not ideal, but, I imagine someone else might run into this so I thought I'd at least post it.

ben-manes commented 8 years ago

I agree its an easy confusion, but allowed given the cache has a probabilistic contract. I switched the write buffer from my own implementation to one of JCTool's excellent queues, so I'd have to dig in further to grok the resize. Perhaps @nitsanw can explain why we're observing LIFO, instead of FIFO, consumption of a MpscChunkedArrayQueue after a resize.

ben-manes commented 8 years ago

I played around and realized, not surprisingly, its a benign oversight in my usage of the write buffer. While the buffer is FIFO, it may fill up in a write-heavy workload. In that case, back-pressure is needed and the writer may block and help out to make forward progress. Its task is run prior to draining the pending ones from the buffer, causing it to be "more recent" and evicted prematurely.

The solution is to reorder the writer's task after the buffer drain. Then your test passes. I probably thought it didn't matter in real-world so didn't try to be strict. But since its observable here I'll make the fix.

ben-manes commented 8 years ago

Released 2.3.4 with this fix. Thanks!

ben-manes / caffeine

cache.put is first to be removed #124