ben-manes / caffeine

A high performance caching library for Java
Apache License 2.0
15.64k stars 1.58k forks source link

Request: Version of LoadingCache with Synchronous Behavior for Invalidate #1739

Closed EarthCitizen closed 1 month ago

EarthCitizen commented 1 month ago

Currently, for LoadingCache, the behavior of the invalidate methods are:

undefined for an entry that is being loaded (or reloaded) and is otherwise not present

It would be incredibly useful if there could be a version of LoadingCache that could guarantee that there is no undesirable interference between load and invalidate operations. That is to say that load operations have to wait on invalidate operations and invalidate operations have to wait on load operations.

ben-manes commented 1 month ago

It is undefined because promising linearizability across all scenarios is not viable. In the typical cases it is and you can rely on that, but the JavaDoc has to be weaker to be correct. You can typically assume the same behavior of ConcurrentHashMap but it is slightly weaker due to the broader feature set.

In the synchronous cache the getAll loads are performed outside of the hash table's locks. That is because we cannot safely lock all of the entries like computeIfAbsent can on a single one. It would have to be directly supported by ConcurrentHashMap so that it is linearizable with other operations, like put and clear.

In the asynchronous cache, the mappings are to CompletableFuture where the loading happens outside of the hash map operation. This means that synchronous view may remove and join on the future, but while waiting a subsequent put wouldn't be blocked establishing a new mapping for that key. The synchronous view is for convenience but has different linearizability properties.

The refresh operations are always outside of the map operations as by their definition they are asynchronous. These use a secondary map that is invalidated when the cache's mapping changes so it will not overwrite with a stale value. This provides the linearizability behavior one would want, but doesn't mean all cache operations are linearizable.

While ConcurrentHashMap's clear is linearizable thanks to traversing the internal table and waiting for in-flight computations, the cache's invalidateAll() is not because it must use the Map's iterator which only exposes established mappings. The cache needs to know which entry was removed in order to update the policy metadata, but since there is no callback or internal iterator allowed this is not possible. It did not make sense to fork the hash table to inspect its internals.

The library needs to be correct in all cases, but usages are only of a subset of features and scenarios. It's perfectly fine to layer on top stronger semantics when the tradeoffs are acceptable. The goal is to be flexible enough for developers to adapt to their needs rather than have everything out-of-the-box with configurations to fine-tune to the desired semantics. The reality is that not everything under the sun can be achieved without some sacrifice, so we defer that to you and don't promise internal implementation details in the behavioral documentation.

ben-manes commented 1 month ago

Also, fwiw, you can look at the Lincheck test for what we promise. This is useful as you can try the cases of interest and the model checker will report if and how they fail.

EarthCitizen commented 1 month ago

Thank you for the detailed explanation.