Open pschmeltzer opened 7 years ago
Exactly same problem for me. Can you advise us about a possible date fixing ? Very difficult for us to optimize database loading by avoiding multiple SQL requests
I was able to easily reproduce with the following code. We will now look for a fix
@Test
public void test() {
CacheLoaderWriter<Integer, String> loaderWriter = new CacheLoaderWriter<Integer, String>() {
@Override
public String load(Integer integer) throws Exception {
return integer.toString();
}
@Override
public Map<Integer, String> loadAll(Iterable<? extends Integer> iterable) {
System.out.println("loadAll called");
return StreamSupport.stream(iterable.spliterator(), false)
.peek(i -> System.out.println("Id: " + i))
.map(e -> new AbstractMap.SimpleImmutableEntry<>(e, e.toString()))
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
}
@Override
public void write(Integer integer, String s) {
}
@Override
public void writeAll(Iterable<? extends Map.Entry<? extends Integer, ? extends String>> iterable) {
}
@Override
public void delete(Integer integer) throws Exception {
}
@Override
public void deleteAll(Iterable<? extends Integer> iterable) {
}
};
CacheManagerBuilder cacheManagerBuilder = newCacheManagerBuilder();
try(CacheManager cacheManager = cacheManagerBuilder.build(true)) {
Cache<Integer, String> cache = cacheManager.createCache("test",
CacheConfigurationBuilder.newCacheConfigurationBuilder(Integer.class, String.class,
ResourcePoolsBuilder.heap(10)).withLoaderWriter(loaderWriter).build());
cache.getAll(IntStream.range(1, 10).mapToObj(i -> Integer.valueOf(i)).collect(Collectors.toSet()));
}
}
Hi Henry,
Do you have any clue about how to solve this challenge ?
I've started to look at it. However, it is on hold right now.
The issue is a bit complicated due to the way Stores are working. I need to reimplement the getAll entirely.
Do you need a workaround?
On 16 March 2017 at 10:04, spierre99 notifications@github.com wrote:
Hi Henry,
Do you have any clue about how to solve this challenge ?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ehcache/ehcache3/issues/1697#issuecomment-287066433, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKRM0VntD0OP4FMH45JucrH0l4EvVlQks5rmUFpgaJpZM4LJANn .
@henri-tremblay Any update on this? Would be interested in a workaround too, if available.
Still complicated and was put on ice for now.
The best workaround I can see is to do the loadAll externally. You load everything you need and the putAll
. And then do the getAll
whenever you need. I'm assuming here that you want to pre-load some data.
Actually this isn't a bug, but rather is a feature… by design. Possibly a non-desired feature though... In the meantime, as I share some responsibility in this design, let me try to explain why the behavior is as such.
Besides the problems around consistency a cache-through pattern addresses, the loader API enables readers to not have to synchronize around populating the cache. The loader will make sure only a single thread will populate the cache for a given key, while other threads accessing the same key will just block until the value is installed.
The CacheLoaderWriter.loadAll()
takes the side of optimizing the cache, not the underlying SoR. This means that the set of keys passed to the .loadAll
method will match a given locked segment of the underlying Cache
's Store
. In the case of the on-heap store, there are no such "segments", keys are locked on a per-key basis, not a segment.
That's why, in the example above, you only get to have an Iterable
with a single entry. I think this has been discussed at length in one the early dev meetings (not quite sure which one though).
The current design enables entries being loaded to be made available as quickly as possible. It could be something else makes this impossible currently (I haven't read the code in a long while now), but this enables two threads each populating one of two keys being loaded.
t1: cache.getAll(a, b);
t2: cache.get(b);
The first thread (t1
) populates a
, but b
is already loaded when it accesses it, as t2
populated the key b
already.
If t1
would have locked both a
& b
prior to loading them, t2
couldn't have parallelized the load to b
while a
was being loaded.
Things can get somewhat more tricky to deal with as well, imagine the case below:
t1: cache.getAll(a, b);
t2: cache.getAll(b, a);
These two threads can now deadlock each other. Obviously ordering the lock acquisitions would be the way to solve this, but there is no contraint on the type of a
& b
to be ? extends Comparable
neither.
Now this can all be mitigated with other locking strategies, like installing some lock object in the mapping, so that t1
could see that a
is already being loaded. Then the cache could either have each thread load a single key, or transfer the responsibility of loading all keys on a single thread (and hence CacheLoaderWriter.loadAll()
, so to provide more options for the loader to optimize data access for the underlying system of record.
All this to say that the route chosen was to optimize for latency of the cache. I hope that the examples above show how that approach makes access to the cache somewhat fairer to all... While other options exists, they come at a higher price in terms of runtime because of the additional coordination requirements.
Now maybe there are better ways of going about this though... I haven't given this any further thoughts for years to be honest. Nonetheless that's a vague attempt at explaining why things are the way they are; what challenges need to be accounted for when trying to "address this issue"; and, most importantly, what tradeoffs they may bring...
Highly interesting explanation. I'll take that in consideration. The problem is really the loader writer. Doing tons of queries to the database instead of one to load a cache isn't at all efficient. So we will have to find a way.
Also, note that loadAll
never was intended to warm the cache up... If
you'd want to do that (effectively), there'd be other means. We used to
have a utility for that (including one that'd snapshot the current hot
keyset at a given interval/shutdown and use that information to reload the
cache up on restart). That could use the same CacheLoaderWriter API.
Anyways, a matter of tradeoffs... I can understand that in case your SoR
enables "batch loads", you may want to leverage that. One idea, thinking
out loud here, is to support both approaches based on config: loader/writer
optimized for either cache or SoR.
In all cases, best of luck :)
Any news on this issue?
Can't we create custom wrapper for it to support getAll with multiple keys and it will call loadAll method for missing keys? Do we see any issue with it?
When using Ehcache3 (3.1.1) with on heap store, calls to getAll with a key set > 1 on cold cache result in loadAll being called for each key separately. The iterable being passed to loadAll in CacheLoaderWriter always appears to be a size of 1 regardless to the amount of keys in getAll.
Here is my cache configuration:
cache = cacheManager.createCache(name, CacheConfigurationBuilder.newCacheConfigurationBuilder(key, value, ResourcePoolsBuilder.heap(entries)) .withLoaderWriter(loader) .build() );
See comments on stackoverflow