cb372 / scalacache

Simple caching in Scala
https://cb372.github.io/scalacache/
Other
771 stars 120 forks source link

Memoize should not allow concurrent computations #85

Open backuitist opened 8 years ago

backuitist commented 8 years ago

Currently given a function f: Future[T] if I call memoize(duration)(f) n times before f finishes I'll get n executions of f. Wouldn't it be better to call f only once?

cb372 commented 8 years ago

Sorry about the slow reply. I was planning to try an implementation before I got back to you, but I haven't got around to it yet.

I think this is a potentially useful feature, but it could also be confusing and/or undesirable behaviour for some users, so it should be configurable.

We also need to be very careful with the implementation. The way I imagine it being implemented, it seems quite easy to run into race conditions, resource leaks and other nasty things.

Here's a rough sketch of a suggested implemention:

if (found in cache)
  return it
else {
  look for a Future in a thread-safe (cache key -> Future) map
  if (Future found)
    return it
  else {
    create a promise and add its Future to the map
    execute the block, which returns a Future
    add an onComplete to the Future that:
      - completes the promise
      - caches the result
      - removes the entry from the map
    return the block's Future
  }
}

Ideally the map lookup and update would be atomic, e.g. using Java 8 ConcurrentHashMap's computeIfAbsent.

If the block returns a failed Future, I guess anybody waiting for the promise should be returned a failed Future as well.

As you can see, this is pretty complex. I'm still in two minds about whether logic like this should be part of ScalaCache, but I'll have a go at implementing it.

If you can think of a simpler implementation, please let me know.

mdedetrich commented 8 years ago

I don't think it should be disallowed, we use the cache concurrently all of the time. If you want to guarantee these things, you need to bring in atomicity and doing so usually comes at quite a big performance, and it will also matter depending what underlying cache you are using.

Tvaroh commented 8 years ago

+1

anzecesar commented 7 years ago

This would make a lot of sense for in-memory local backends (Guava, EhCache, Caffeine).

if f: Future[T] is an expensive operation, it makes sense we want to limit the number of times it gets invoked - especially if it has side-effects.

For local backends you can simply cache the future, returning it on subsequent calls, even if it's not completed yet.

An example is here: https://github.com/spray/spray/blob/master/spray-caching/src/main/scala/spray/caching/LruCache.scala

talbenshabtay commented 7 years ago

we came across this issue now as well , having a configuration for this kind of behaviour seems like the correct approach here , because i dont see a real benefit for providing a new abstraction layer that should distinguish between InMemory and Remote caches ,

Also adding that , the name of the method memoizeSync also is a bit confusing because it gives the impression of atomicity were in fact its not.