loadAll being called once for each element of getAll

pschmeltzer commented 7 years ago

When using Ehcache3 (3.1.1) with on heap store, calls to getAll with a key set > 1 on cold cache result in loadAll being called for each key separately. The iterable being passed to loadAll in CacheLoaderWriter always appears to be a size of 1 regardless to the amount of keys in getAll.

Here is my cache configuration: cache = cacheManager.createCache(name, CacheConfigurationBuilder.newCacheConfigurationBuilder(key, value, ResourcePoolsBuilder.heap(entries)) .withLoaderWriter(loader) .build() );

See comments on stackoverflow

spierre99 commented 7 years ago

Exactly same problem for me. Can you advise us about a possible date fixing ? Very difficult for us to optimize database loading by avoiding multiple SQL requests

henri-tremblay commented 7 years ago

I was able to easily reproduce with the following code. We will now look for a fix

@Test
public void test() {
    CacheLoaderWriter<Integer, String> loaderWriter = new CacheLoaderWriter<Integer, String>() {
        @Override
        public String load(Integer integer) throws Exception {
            return integer.toString();
        }

        @Override
        public Map<Integer, String> loadAll(Iterable<? extends Integer> iterable) {
            System.out.println("loadAll called");
            return StreamSupport.stream(iterable.spliterator(), false)
                .peek(i -> System.out.println("Id: " + i))
                .map(e -> new AbstractMap.SimpleImmutableEntry<>(e, e.toString()))
                .collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue));
        }

        @Override
        public void write(Integer integer, String s) {
        }

        @Override
        public void writeAll(Iterable<? extends Map.Entry<? extends Integer, ? extends String>> iterable) {
        }

        @Override
        public void delete(Integer integer) throws Exception {
        }

        @Override
        public void deleteAll(Iterable<? extends Integer> iterable) {
        }
    };
    CacheManagerBuilder cacheManagerBuilder = newCacheManagerBuilder();
    try(CacheManager cacheManager = cacheManagerBuilder.build(true)) {
        Cache<Integer, String> cache = cacheManager.createCache("test",
            CacheConfigurationBuilder.newCacheConfigurationBuilder(Integer.class, String.class,
                ResourcePoolsBuilder.heap(10)).withLoaderWriter(loaderWriter).build());

        cache.getAll(IntStream.range(1, 10).mapToObj(i -> Integer.valueOf(i)).collect(Collectors.toSet()));
    }
}

spierre99 commented 7 years ago

Hi Henry,

Do you have any clue about how to solve this challenge ?

henri-tremblay commented 7 years ago

I've started to look at it. However, it is on hold right now.

The issue is a bit complicated due to the way Stores are working. I need to reimplement the getAll entirely.

Do you need a workaround?

On 16 March 2017 at 10:04, spierre99 notifications@github.com wrote:

Hi Henry,

Do you have any clue about how to solve this challenge ?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ehcache/ehcache3/issues/1697#issuecomment-287066433, or mute the thread https://github.com/notifications/unsubscribe-auth/ABKRM0VntD0OP4FMH45JucrH0l4EvVlQks5rmUFpgaJpZM4LJANn .

henryptung commented 7 years ago

@henri-tremblay Any update on this? Would be interested in a workaround too, if available.

henri-tremblay commented 7 years ago

Still complicated and was put on ice for now.

The best workaround I can see is to do the loadAll externally. You load everything you need and the putAll. And then do the getAll whenever you need. I'm assuming here that you want to pre-load some data.

alexsnaps commented 7 years ago

Actually this isn't a bug, but rather is a feature… by design. Possibly a non-desired feature though... In the meantime, as I share some responsibility in this design, let me try to explain why the behavior is as such.

What the CacheLoader APIs aims at solving

Besides the problems around consistency a cache-through pattern addresses, the loader API enables readers to not have to synchronize around populating the cache. The loader will make sure only a single thread will populate the cache for a given key, while other threads accessing the same key will just block until the value is installed.

The idea behind the design

The CacheLoaderWriter.loadAll() takes the side of optimizing the cache, not the underlying SoR. This means that the set of keys passed to the .loadAll method will match a given locked segment of the underlying Cache's Store. In the case of the on-heap store, there are no such "segments", keys are locked on a per-key basis, not a segment.

That's why, in the example above, you only get to have an Iterable with a single entry. I think this has been discussed at length in one the early dev meetings (not quite sure which one though).

Simplicity

The current design enables entries being loaded to be made available as quickly as possible. It could be something else makes this impossible currently (I haven't read the code in a long while now), but this enables two threads each populating one of two keys being loaded.

t1: cache.getAll(a, b);
t2: cache.get(b);

The first thread (t1) populates a, but b is already loaded when it accesses it, as t2 populated the key b already. If t1 would have locked both a & b prior to loading them, t2 couldn't have parallelized the load to b while a was being loaded.

Ordering

Things can get somewhat more tricky to deal with as well, imagine the case below:

t1: cache.getAll(a, b);
t2: cache.getAll(b, a);

These two threads can now deadlock each other. Obviously ordering the lock acquisitions would be the way to solve this, but there is no contraint on the type of a & b to be ? extends Comparable neither.

Now this can all be mitigated with other locking strategies, like installing some lock object in the mapping, so that t1 could see that a is already being loaded. Then the cache could either have each thread load a single key, or transfer the responsibility of loading all keys on a single thread (and hence CacheLoaderWriter.loadAll(), so to provide more options for the loader to optimize data access for the underlying system of record.

So... what?

All this to say that the route chosen was to optimize for latency of the cache. I hope that the examples above show how that approach makes access to the cache somewhat fairer to all... While other options exists, they come at a higher price in terms of runtime because of the additional coordination requirements.

Now maybe there are better ways of going about this though... I haven't given this any further thoughts for years to be honest. Nonetheless that's a vague attempt at explaining why things are the way they are; what challenges need to be accounted for when trying to "address this issue"; and, most importantly, what tradeoffs they may bring...

henri-tremblay commented 7 years ago

Highly interesting explanation. I'll take that in consideration. The problem is really the loader writer. Doing tons of queries to the database instead of one to load a cache isn't at all efficient. So we will have to find a way.

alexsnaps commented 7 years ago

Also, note that loadAll never was intended to warm the cache up... If you'd want to do that (effectively), there'd be other means. We used to have a utility for that (including one that'd snapshot the current hot keyset at a given interval/shutdown and use that information to reload the cache up on restart). That could use the same CacheLoaderWriter API. Anyways, a matter of tradeoffs... I can understand that in case your SoR enables "batch loads", you may want to leverage that. One idea, thinking out loud here, is to support both approaches based on config: loader/writer optimized for either cache or SoR. In all cases, best of luck :)

dsylenko commented 4 years ago

Any news on this issue?

rkeval commented 12 hours ago

Can't we create custom wrapper for it to support getAll with multiple keys and it will call loadAll method for missing keys? Do we see any issue with it?

ehcache / ehcache3