[BUG] Unbounded growth of the cache when writing in a tight loop (weight-based cache)

blemale / scaffeine

Thin Scala wrapper for Caffeine (https://github.com/ben-manes/caffeine)

Apache License 2.0

265 stars 24 forks source link

I was using the weight-based Caffeine cache and I noticed the bug where if we hit the cache very hard in a tight loop, the cache will grow unbounded since the eviction cannot keep up.

Please see the simple test code lists below

package concurrency

import com.github.benmanes.caffeine.cache.RemovalCause
import com.github.blemale.scaffeine.{Cache, Scaffeine}

import java.util.concurrent.atomic.AtomicInteger

sealed class MyCache {
  /** The weigh function for Caffeine cache. */
  private def weigh(key: Int, value: String) = {
    value.length
  }

  private val numEvictedEntries = new AtomicInteger()

  /** Record the number of entries being evicted. */
  private def evictionListener: (Int, String, RemovalCause) => Unit =
    (key: Int, value: String, cause: RemovalCause) => {
      numEvictedEntries.addAndGet(1)
    }

  /** Caffeine cache */
  private val cache: Cache[Int, String] =
    Scaffeine()
      .recordStats()
      .weigher(weigh)
      .maximumWeight(100)
      .evictionListener(evictionListener)
      .build()

  /** Get the number of evicted entries. */
  def getNumEvicted: Int = numEvictedEntries.intValue()

  def read(key: Int): Option[String] = cache.getIfPresent(key)

  def write(key: Int, value: String): Unit = cache.put(key, value)
}

object HitCache extends App {

  println("START")

  val myCache: MyCache = new MyCache

  for (index <- 0 until 10000) {
    val key = index
    val value = "A * 10" + index.toString

    myCache.write(key, value)

    // Test if the write succeeded
    val optionString = myCache.read(key)
    assert(optionString.isDefined) // BUG -- assertion failed
    assert(optionString.get == value)
  }

  println("OK")

}

ThisBuild / scalaVersion := "2.12.10"
...
libraryDependencies += "com.github.blemale" % "scaffeine_2.12" % "5.2.1"

The bug only affects the weight-based cache, it does not affect the size-based cache.

Caffeine uses a write buffer so that concurrent writes do not block each other to maintain the eviction policy. This allows the buffer to absorb bursts of activity and batch the operations, rather than serializing all writes against an eviction lock. This buffers write operations which is independent of the weight, so the cache can temporarily exceed the maximum by up to 128 * NCPUs pending ops. Once this buffer is full then back pressure is applied so that writes assist in eviction and avoids runaway growth. The maintenance work is scheduled immediately once a write occurs, which due to concurrent writes may capture a batch of work.

You observe this in your test because, by default, eviction is scheduled on the ForkJoinPool.commonPool(). This is to avoid penalizing the caller if the eviction listener is expensive. Since it evicts asynchronously your test thread races ahead. If you set a same-thread executor, Caffeine.executor(Runnable::run), then the caller would perform the work and since there is no concurrency it would stay within bounds.

This slack is implicit in any GC'd language as dead objects are not instantaneously removed from memory. The buffering lets improve read and write performance compared to a synchronized cache, often segmented with only modest gain. You won't see unbounded growth under a stress test, but it will exceed up to the number of pending writes afforded by the system.

blemale / scaffeine

[BUG] Unbounded growth of the cache when writing in a tight loop (weight-based cache) #333