ben-manes / caffeine

A high performance caching library for Java
Apache License 2.0
15.63k stars 1.58k forks source link

JCache and entry expiration #818

Closed javaduke closed 1 year ago

javaduke commented 1 year ago

I have a custom cache loader and writer (which persist my entries in the database). I also have a custom event listener which listens to the EXPIRE events and deletes the entries from the database. I noticed that when the cache is expired, the event is not fired until the cache is accessed again, but then it is fired in a separate thread, so the expired object is still returned, and THEN deleted.

I create and configure the cache programmatically using the JCache API. Both read-through and write-through in my configuration are set to true and the event listener is synchronous. Wondering if I'm missing anything or is it a bug?

ben-manes commented 1 year ago

I'm not sure about the expired object being returned. You might need to write a unit test for that. For prompt expiration you can specify a scheduler, e.g. Scheduler.systemScheduler(). This way a thread will invoke Cache.cleanUp() to perform the maintenance work after expiration. Caffeine won't create threads itself, but will use platform capabilities if allowed (systemScheduler uses a Java 9 enhancement to CompletableFuture for a jvm-wide scheduler thread).

If all else equal, then I do not recommend JCache as it has a poor API, poor quality tck, was rejected multiple times by JEE, and no future revisions are planned. It's provided for easier compatibility for the frameworks who did adopt it (hibernate), but otherwise did not get much traction. We'll fix bugs as they come up, but I think it is a dead end from a user's pov.

javaduke commented 1 year ago

Yes, unfortunately I'm stuck with JCache because Camel route policy requires JCache implementation. Here's what I observe in the logs when I request an expired entry:

11:12:45.357 [ForkJoinPool.commonPool-worker-29] DEBUG i.p.c.c.j.JdbcCacheEntryExpiredListener - : Entry with the key 1234 is expired, removing...
11:12:45.359 [executor-thread-0] DEBUG i.p.c.c.j.AbstractJdbcCacheOperations - 973F4324BE35479-0000000000000001: Loaded object 'HelloWorld' for key '1234'
11:12:45.361 [ForkJoinPool.commonPool-worker-29] DEBUG i.p.c.c.j.AbstractJdbcCacheOperations - : Deleted for key '1234'

So it looks like ForkJoinPool is calling the listener but the cache loader is accessed in the main executor thread.

ben-manes commented 1 year ago

By default the cache piggybacks on other operations to trigger a maintenance cycle, so if inactive then an expired entry will remain for a while. The scheduler resolves this to be more prompt. If the exact entry is accessed then the caller should not receive an expired value and load a new one to replace it. The listeners are invoked asynchronously on the configured executor, by default the commonPool. You could change that, e.g. Runnable::run for caller runs if that is what you want?

JCache has a concept of synchronous listeners, see CacheEntryListenerConfiguration.isSynchronous(). That ensures that the caller blocks waiting for the event to be processed by the listeners. It will still fire and load asynchronously, just not return until the listener has completed its work.

javaduke commented 1 year ago

Yes, that's the point, I configured the listener as synchronous, but it is apparently ignored by caffeine. If I use a different JCache implementation, e.g. EHCache, everything works fine. Again, the issue is that the expired value is still returned

ben-manes commented 1 year ago

If you can write a unit test about what you are expecting that would help. There are existing tests to start from if you prefer.

The synchronous only means that the caller is blocked until it completes, not that the caller runs the listener operation. JCache assumed that the operation might be for a distributed cache where network accesses occur, there might be multiple listeners, etc. So I am unsure if your issue is that the commonPool is used or if the caller is receiving a stale value. The latter would be a bug and a unit test to assert that would help, since those that I wrote and the tck pass so best for a fresh attempt to catch if a bug.

javaduke commented 1 year ago

OK, thanks, Ben, I'll try to come up with a unit test. It may be a bit tricky, but I'll see what I can do.

ben-manes commented 1 year ago

Reading over this again and letting it sink in, and I better understand what you are saying. Yes, there is a race condition in your logic because the loader and listener run on different threads, so the loader can reload the entry from the database before the listener deletes it. If you configure the CaffeineConfiguration.setExecutorFactory to a direct executor then the listener will run before the loader, and you will get the desired operation order.

My understanding of the spec's isSynchronous does not mean that it should run on the caller to give that ordering, only that it forces the caller to wait. The document doesn't say much more than the javadoc, which is vague. It seemed to be more about in-order event processing from the listener's perspective rather than if the listener was invoked prior to the loader.

javaduke commented 1 year ago

I see, so in order to do this I will have to use the Caffeine API, which is less than ideal (because I was hoping my code can be mode generic and use the standard JCache API). But I'l give it a try.

ben-manes commented 1 year ago

yes, or else if you use the configuration file format then specify the caffeine.jcache.executor.

https://github.com/ben-manes/caffeine/blob/9536f65ba829fdaa74ce87a202b73d4aa5960854/jcache/src/main/resources/reference.conf#L26-L28

javaduke commented 1 year ago

hmm, doesn't seem to work:

cache = cacheManager.createCache(cacheName, configuration);

            Configuration conf = cache.getConfiguration(Configuration.class);
            if ("com.github.benmanes.caffeine.jcache.configuration.CaffeineConfiguration".equalsIgnoreCase(conf.getClass().getName())) {
                LOG.debug("Caffeing cache detected, setting direct executor");
                ((CaffeineConfiguration)conf).setExecutorFactory(FactoryBuilder.factoryOf(DirectExecutor.class));
            }

But I still see ForkJoinPool.commonPool calling the listener...

I also tried to configure it in my application.yml but for some reason it is completely ignored too.

ben-manes commented 1 year ago

It looks like you changed the configuration after it was created? I don't believe that is valid for JCache and generally won't be honored. You would need to create the cache with the intended configuration.

ben-manes commented 1 year ago

Oh, I even fixed that in master. The getConfiguration method states The returned value must be immutable. so on master it will throw an exception on any mutators.

javaduke commented 1 year ago

Yes, with direct executor it works now, thank you very much for all your help!!!

ben-manes commented 1 year ago

I dug into the mailing list, issue tracker, etc for any context to isSynchronous. Below is mostly for my own reference if this comes up again.

Historically this seems to be a jumble of different ideas, opinions, and to allow existing distributed caches to strengthen this concept to match their desired semantics. In many discussions it is stated as strictly an observer pattern for ensuring that all outstanding operations completed when the caller returns, but the listener is decoupled from the atomic scope of the cache operation (e.g. due to network). It seems that transactions (not implemented) were intended to opt synchronous listeners into the atomic scope (though it was under debate if events were rolled back), but without that they are assumed to be run sometime after the operation completed but before the caller returns. An alternative idea that was rejected was called "interceptors" which ran synchronously within the atomic scope, so they were not strictly external observers.

Coherence and Ehcache drove the spec, where the former specified that their MapListeners are very different from JCache's in their synchronous guarantees. In that case it behaves how you want, but with a warning as to the potential performance issues involved. That works for them because a single node owns the data for a given entry and any other is a read replica, so their synchronous listener runs on that owning node. In others product topologies there is not a single owner so they did not specify that behavior. Instead they only cared that a client receives an acknowledgement that the events were processed for a happens-before ordering between cache operations (and not between loaders/writers/listeners). Any L1 / L2 concept was considered a separate vendor feature that would not be built on top of JCache listeners, so their listeners only act as observers with no impact to the atomic scope. Transactions would have then filled this gap, but that feature was considered too large for a 1.0.

Unfortunately the reference implementation (RI) is not production quality so we cannot infer much from it. It does not dispatch events within an atomic scope and lacks support for synchronous listeners (simply ignores that property). The TCK lacks any tests for synchronous listeners, which allowed multiple certified vendors to not bother implementing them (RI, Apache JCS, Apache Ignite, Blazing Cache, etc).

In summary, it seems that your use-case isn't covered by this feature and it relies on implementation behavior to work. So if you move to a different vendor it may not function correctly and there is no vendor agnostic workaround.

If Caffeine was to adapt to being part of the atomic scope, then the change would be fairly trivial. Once in the atomic scope of a compute (e.g. for a cache load), it would query the EventDispatcher for any in-flight futures associated to that key for a synchronous listener and block before proceeding with the cache operation. This would still pass the spec as a listener is spec'd to not expected to be able to mutates the cache and risks deadlock if doing so.

I am not inclined to make any changes given the vague and limited wording, utter confusion by the spec authors, api decisions made over private discussions without explanations, and a wide variety of certified implementation behaviors. I agree with what you wanted to do, but the JSR is too muddy to rely upon for this in an agnostic way. The executor configuration makes it clearly a vendor workaround that solved your problem, so while ugly it seems safer to not change this and break someone else's assumptions.

I am drawn back to the original conclusions: avoid this JSR whenever possible; it is trash.