Closed fooltuboshu closed 6 months ago
For the compute_samekey benchmark case, it seems ConcurrentHashMap has much less ops/s compare to the Caffeine cache.... I had ran similar case and ... ConcurrentHashMap and Caffeine has similar performance.
The Java 8 implementation of ConcurrentHashMap was pessimistic by always locking the hashbin before doing the computation. This caused a lot of contention for popular items that were present, which the same-key benchmark highlights and explains why spreading the requests improved performance by reducing lock contention. These results were discussed with Doug Lea in 2014, but unfortunately those public archives are lost so my email's archive is below.
This was improved in Java 9 by adding a partial pre-screen (1cif) to optimistically avoid locking if the first entry in the hashbin is for that key, else fallback to the locking behavior. This means the results can vary quite broadly, as the github action benchmark run shows a similar bottleneck on modern JDKs. Caffeine always performs a full prescreen (get+cif) which ensures reliably good performance. Since it is built on top of ConcurrentHashMap, we generally assume it has equal or slightly worse performance except in rare edge cases like this one.
I also checked you benchmark class code and wonder why you not set the maximum size when setup the ConcurrentHashMap and Caffeine cache.
There is no maximum size bounding for ConcurrentHashMap, only an initial table capacity which assists in preallocating locks. Any bounding policy would incur additional work to penalize and deviate the two cases, as Caffeine would also need to track read access patterns. That was irrelevant for this benchmark which served to highlight the lock contention problem of a computing get and its workaround, which many later experienced and were guided by. The maximum size benchmarks were shown next so none of the overhead impact of a size bounding was hidden.
For read and write benchmarks, I wonder why you didn’t include the ConcurrentHashMap as the cacheType.
That was because the benchmarks were to show the overhead of a size bounding and the performance challenge is entails. An unbounded ConcurrentHashMap
demonstrates the best case, which is shown in the server benchmarks section below. The consumer laptop section was an at a glance view, and I didn't want to imply users should use unbounded caches to achieve good performance. The server benchmarks are more detailed to provide a better perspective when running on production-like hardware. And of course this snapshot of performance results could be retested with other cache types, like you did, so it was meant only to start the discussion rather than end it. These were some of the first cache concurrency benchmarks that were correct, as most across industry and research were very wrong and misleading, so it gave a valid starting point to discuss from.
Hi @ben-manes , thank you very much for your detailed and patient response.
LinkedHashMap
guarded by a lock so covered by LinkedHashMap_Lru
(their LRUCache). Their FIFO / LFU caches appear broken by not locking on the read (their StampedCache), which can cause an infinite loop when the hash table resizes. Therefore I do not think adding this particular library is beneficial.I'll close for now but happy to keep chatting here or elsewhere.
Hi I am reading the benchmark results https://github.com/ben-manes/caffeine/wiki/Benchmarks and have some questions around it.
cc: @HowellWang