Open cruftex opened 5 years ago
Caffeine has released with the latest adaptive algorithm. Do you know how it compares to Cache2k now?
@hc-codersatlas:
You can run the benchmarks with traces as explained in https://cruftex.net/2016/05/09/Java-Caching-Benchmarks-2016-Part-2.html
Be aware that this is rather academic and any result may or may not apply to your actual workload.
I will take a deeper look on this later in time. The goal of cache2k is to have good enough eviction results with very low overhead in the critical access path. That said, the goal is not having better eviction results then any other cache. In that sense, I see cache2k well balanced out.
The last days I've been working quite a bit on the eviction. For the tests of the adaption algorithm of Caffeine Ben was using a trace that leads to quite devastating hitrates in cache2k. I improved the testbed for eviction algorithms in cache2k-benchmark
so its possible to run an isolated version of the eviction algorithms and reproduced the results:
Trace | Size | cache2k v1.2 | Caffeine 2.8 |
---|---|---|---|
corda-loop-corda | 512 | 0.176 | 42.875 |
corda-loop-corda | 1024 | 2.275 | 75.521 |
corda-small | 512 | 0.682 | 30.431 |
corda-small | 1024 | 6.808 | 28.807 |
corda-small-10x | 5120 | 0.604 | 30.456 |
corda-small-10x | 10240 | 5.659 | 28.751 |
loop | 256 | 0.000 | 23.928 |
loop | 512 | 0.000 | 49.698 |
The trace corda-small-10x
repeats each access 10 times in a different number space.
One reason for the outcome is, that the current eviction algorithm is keeping not more referenced hot entries too long. The removal of the demotion from hot to cold space from the original Clock-Pro algorithm shows good results. It seams that in combination with the hit counter this is not necessary, so in the end more frequently used items are preferred over items that have a reuse distance that make them hop between hot and cold. The overall eviction efficiency is improved and the list operations in the the eviction algorithm is drastically reduced. It's still not adaptive, so its not a solution for best performance on the corda trace, which is LRU friendly.
I plan to release what I have now, which looks like a good improvement increment as in upcoming version 1.4. Here is the comparison:
Trace | Size | cache2k v1.2 | cache2k v1.4 |
---|---|---|---|
corda-loop-corda | 512 | 0.176 | 1.547 |
corda-loop-corda | 1024 | 2.275 | 11.499 |
corda-small | 512 | 0.682 | 4.505 |
corda-small | 1024 | 6.808 | 32.921 |
corda-small-10x | 5120 | 0.604 | 3.692 |
corda-small-10x | 10240 | 5.659 | 32.863 |
loop | 256 | 0.000 | 24.481 |
loop | 512 | 0.000 | 48.962 |
On bigger and realistic workloads it looks like this:
Trace | Size | cache2k v1.2 | cache2k v1.4 | Caffeine 2.8 |
---|---|---|---|---|
financial1 | 50000 | 47.093 | 47.390 | 47.253 |
financial1 | 100000 | 48.553 | 48.778 | 49.811 |
oltp | 5000 | 53.255 | 53.341 | 55.320 |
oltp | 10000 | 60.993 | 61.070 | 59.492 |
web12 | 3000 | 78.242 | 78.166 | 75.648 |
web12 | 1200 | 70.497 | 70.641 | 68.876 |
wikipedia1 | 1024 | 59.834 | 59.770 | 57.272 |
wikipedia1 | 2048 | 62.017 | 61.940 | 60.009 |
Except for corda at the small cache size of 512 entries, the to be released cache2k v1.4 is within a range of -3,5 and +2,5 hitrate difference compared to Caffeine 2.8. The static tuning, which comes with cache2k currently, favors more skewed distributions that will be found in typical web traffic, but it is less favorable for transaction oriented workloads.
fyi - I cannot reproduce the loop
improvements in 1.3.1.Alpha
at 256/512, but I do see the improvement to corda-small
. Can you verify that we're using the same Lirs loop trace file and that my configuration is valid? If I printout the hit rate from cache.getStatistics().getHitRate()
it matches my calculation when comparing with multi1
. The ClockProPlusEviction
seems to be updated, verified jars, etc. Maybe the improvement was lost during the refactoring?
I double checked. I missed to take over a tiny change from the testbed version to production. I will do a new release tomorrow. My bad. Release early, release often.... Thanks @ben-manes !
V2.6 gets a tiny chance because I discovered a not optimal behavior. The current eviction algorithm (V1.x - V2.4) inserts entries evicted from hot into the ghost history. However, that makes no sense, because the entry will hardly have better reuse distance next time. The original Clock-Pro algorithm is also not doing this. Here is the result when compared to V2.4.1.Final:
Trace Name | Cache Size | Reference | Hitrate | Best | Hitrate | Diff |
---|---|---|---|---|---|---|
financial1-1M | 12500 | cache2k* | 39.76 | c2k2x(V26) | 40.66 | 0.90 |
financial1-1M | 25000 | cache2k* | 52.64 | c2k2x(V26) | 52.74 | 0.10 |
financial1-1M | 50000 | cache2k* | 54.17 | c2k2x(V26) | 54.41 | 0.24 |
financial1-1M | 100000 | cache2k* | 54.55 | c2k2x(V26) | 54.69 | 0.13 |
financial1-1M | 200000 | cache2k* | 54.79 | c2k2x(V26) | 55.47 | 0.68 |
scarab-recs | 25000 | cache2k* | 68.65 | c2k2x(V26) | 69.18 | 0.53 |
scarab-recs | 50000 | cache2k* | 74.07 | c2k2x(V26) | 74.90 | 0.83 |
scarab-recs | 75000 | cache2k* | 77.13 | c2k2x(V26) | 78.20 | 1.08 |
scarab-recs | 100000 | cache2k* | 79.25 | c2k2x(V26) | 80.40 | 1.15 |
loop | 256 | cache2k* | 24.48 | c2k2x(V26) | 23.69 | -0.79 |
loop | 512 | cache2k* | 48.96 | c2k2x(V26) | 47.48 | -1.48 |
zipf10K-1M | 500 | cache2k* | 67.51 | c2k2x(V26) | 67.31 | -0.20 |
zipf10K-1M | 1000 | cache2k* | 74.46 | c2k2x(V26) | 74.20 | -0.26 |
zipf10K-1M | 2000 | cache2k* | 81.53 | c2k2x(V26) | 81.27 | -0.27 |
web12 | 200 | cache2k* | 45.52 | c2k2x(V26) | 46.63 | 1.10 |
web12 | 300 | cache2k* | 52.35 | c2k2x(V26) | 53.25 | 0.90 |
web12 | 400 | cache2k* | 57.00 | c2k2x(V26) | 57.65 | 0.65 |
web12 | 800 | cache2k* | 66.63 | c2k2x(V26) | 66.78 | 0.15 |
web12 | 1200 | cache2k* | 70.65 | c2k2x(V26) | 70.80 | 0.15 |
web12 | 3000 | cache2k* | 78.15 | c2k2x(V26) | 78.46 | 0.31 |
oltp | 2500 | cache2k* | 46.70 | c2k2x(V26) | 47.59 | 0.89 |
oltp | 5000 | cache2k* | 53.32 | c2k2x(V26) | 53.87 | 0.55 |
oltp | 10000 | cache2k* | 60.60 | c2k2x(V26) | 61.18 | 0.58 |
All real world traces benefit, while artificial traces (Zipfian and loop) do loose a bit. So this modification will be included in V2.6. Update: In the above results the V26 variant has a changed hot max setting from 97 to 94, which is responsible for the big differences. My bad.
Corrected statistics with the V2.6 change without changing hot max:
Trace Name | Cache Size | Reference | Hitrate | Best | Hitrate | Diff |
---|---|---|---|---|---|---|
financial1-1M | 12500 | c2k2x(V24) | 39.76 | c2k2x(V26) | 40.04 | 0.28 |
financial1-1M | 25000 | c2k2x(V24) | 52.64 | c2k2x(V26) | 52.64 | -0.00 |
financial1-1M | 50000 | c2k2x(V24) | 54.17 | c2k2x(V26) | 54.35 | 0.18 |
financial1-1M | 100000 | c2k2x(V24) | 54.55 | c2k2x(V26) | 54.59 | 0.04 |
financial1-1M | 200000 | c2k2x(V24) | 54.79 | c2k2x(V26) | 54.79 | 0.00 |
scarab-recs | 25000 | c2k2x(V24) | 68.65 | c2k2x(V26) | 68.78 | 0.13 |
scarab-recs | 50000 | c2k2x(V24) | 74.07 | c2k2x(V26) | 74.23 | 0.16 |
scarab-recs | 75000 | c2k2x(V24) | 77.13 | c2k2x(V26) | 77.31 | 0.18 |
scarab-recs | 100000 | c2k2x(V24) | 79.25 | c2k2x(V26) | 79.42 | 0.17 |
loop | 256 | c2k2x(V24) | 24.48 | c2k2x(V26) | 24.48 | 0.00 |
loop | 512 | c2k2x(V24) | 48.96 | c2k2x(V26) | 48.96 | 0.00 |
zipf10K-1M | 500 | c2k2x(V24) | 67.51 | c2k2x(V26) | 67.49 | -0.02 |
zipf10K-1M | 1000 | c2k2x(V24) | 74.46 | c2k2x(V26) | 74.39 | -0.07 |
zipf10K-1M | 2000 | c2k2x(V24) | 81.53 | c2k2x(V26) | 81.43 | -0.11 |
web12 | 200 | c2k2x(V24) | 45.52 | c2k2x(V26) | 45.37 | -0.16 |
web12 | 300 | c2k2x(V24) | 52.35 | c2k2x(V26) | 52.24 | -0.11 |
web12 | 400 | c2k2x(V24) | 57.00 | c2k2x(V26) | 56.85 | -0.15 |
web12 | 800 | c2k2x(V24) | 66.63 | c2k2x(V26) | 66.51 | -0.12 |
web12 | 1200 | c2k2x(V24) | 70.65 | c2k2x(V26) | 70.55 | -0.09 |
web12 | 3000 | c2k2x(V24) | 78.15 | c2k2x(V26) | 78.07 | -0.08 |
oltp | 2500 | c2k2x(V24) | 46.70 | c2k2x(V26) | 46.71 | 0.01 |
oltp | 5000 | c2k2x(V24) | 53.32 | c2k2x(V26) | 53.41 | 0.09 |
oltp | 10000 | c2k2x(V24) | 60.60 | c2k2x(V26) | 60.64 | 0.03 |
The effect is actual minimal, however, it has relevant savings in processing time, so I keep it. For the next release 2.8 I plan to do more eviction improvements and look into different hot max settings.
The cache2k clock-pro algorithm runs with fixed sizes of data structures that yielded the best results over all benchmark trace I could get a hold on, see: https://cruftex.net/2016/05/09/Java-Caching-Benchmarks-2016-Part-2.html
Nevertheless there are some patterns that result in worse hit rates than LRU.
Ben is experimenting with some adaptions, that look interesting. See: https://github.com/ben-manes/caffeine/issues/106