cache2k / cache2k

Lightweight, high performance Java caching
http://cache2k.org
Apache License 2.0
705 stars 75 forks source link

Adaptive cache eviction #101

Open cruftex opened 5 years ago

cruftex commented 5 years ago

The cache2k clock-pro algorithm runs with fixed sizes of data structures that yielded the best results over all benchmark trace I could get a hold on, see: https://cruftex.net/2016/05/09/Java-Caching-Benchmarks-2016-Part-2.html

Nevertheless there are some patterns that result in worse hit rates than LRU.

Ben is experimenting with some adaptions, that look interesting. See: https://github.com/ben-manes/caffeine/issues/106

re-thc commented 5 years ago

Caffeine has released with the latest adaptive algorithm. Do you know how it compares to Cache2k now?

cruftex commented 5 years ago

@hc-codersatlas:

You can run the benchmarks with traces as explained in https://cruftex.net/2016/05/09/Java-Caching-Benchmarks-2016-Part-2.html

Be aware that this is rather academic and any result may or may not apply to your actual workload.

I will take a deeper look on this later in time. The goal of cache2k is to have good enough eviction results with very low overhead in the critical access path. That said, the goal is not having better eviction results then any other cache. In that sense, I see cache2k well balanced out.

cruftex commented 5 years ago

The last days I've been working quite a bit on the eviction. For the tests of the adaption algorithm of Caffeine Ben was using a trace that leads to quite devastating hitrates in cache2k. I improved the testbed for eviction algorithms in cache2k-benchmark so its possible to run an isolated version of the eviction algorithms and reproduced the results:

Trace Size cache2k v1.2 Caffeine 2.8
corda-loop-corda 512 0.176 42.875
corda-loop-corda 1024 2.275 75.521
corda-small 512 0.682 30.431
corda-small 1024 6.808 28.807
corda-small-10x 5120 0.604 30.456
corda-small-10x 10240 5.659 28.751
loop 256 0.000 23.928
loop 512 0.000 49.698

The trace corda-small-10x repeats each access 10 times in a different number space.

One reason for the outcome is, that the current eviction algorithm is keeping not more referenced hot entries too long. The removal of the demotion from hot to cold space from the original Clock-Pro algorithm shows good results. It seams that in combination with the hit counter this is not necessary, so in the end more frequently used items are preferred over items that have a reuse distance that make them hop between hot and cold. The overall eviction efficiency is improved and the list operations in the the eviction algorithm is drastically reduced. It's still not adaptive, so its not a solution for best performance on the corda trace, which is LRU friendly.

I plan to release what I have now, which looks like a good improvement increment as in upcoming version 1.4. Here is the comparison:

Trace Size cache2k v1.2 cache2k v1.4
corda-loop-corda 512 0.176 1.547
corda-loop-corda 1024 2.275 11.499
corda-small 512 0.682 4.505
corda-small 1024 6.808 32.921
corda-small-10x 5120 0.604 3.692
corda-small-10x 10240 5.659 32.863
loop 256 0.000 24.481
loop 512 0.000 48.962

On bigger and realistic workloads it looks like this:

Trace Size cache2k v1.2 cache2k v1.4 Caffeine 2.8
financial1 50000 47.093 47.390 47.253
financial1 100000 48.553 48.778 49.811
oltp 5000 53.255 53.341 55.320
oltp 10000 60.993 61.070 59.492
web12 3000 78.242 78.166 75.648
web12 1200 70.497 70.641 68.876
wikipedia1 1024 59.834 59.770 57.272
wikipedia1 2048 62.017 61.940 60.009

Except for corda at the small cache size of 512 entries, the to be released cache2k v1.4 is within a range of -3,5 and +2,5 hitrate difference compared to Caffeine 2.8. The static tuning, which comes with cache2k currently, favors more skewed distributions that will be found in typical web traffic, but it is less favorable for transaction oriented workloads.

ben-manes commented 5 years ago

fyi - I cannot reproduce the loop improvements in 1.3.1.Alpha at 256/512, but I do see the improvement to corda-small. Can you verify that we're using the same Lirs loop trace file and that my configuration is valid? If I printout the hit rate from cache.getStatistics().getHitRate() it matches my calculation when comparing with multi1. The ClockProPlusEviction seems to be updated, verified jars, etc. Maybe the improvement was lost during the refactoring?

cruftex commented 5 years ago

I double checked. I missed to take over a tiny change from the testbed version to production. I will do a new release tomorrow. My bad. Release early, release often.... Thanks @ben-manes !

cruftex commented 2 years ago

V2.6 gets a tiny chance because I discovered a not optimal behavior. The current eviction algorithm (V1.x - V2.4) inserts entries evicted from hot into the ghost history. However, that makes no sense, because the entry will hardly have better reuse distance next time. The original Clock-Pro algorithm is also not doing this. Here is the result when compared to V2.4.1.Final:

Trace Name Cache Size Reference Hitrate Best Hitrate Diff
financial1-1M 12500 cache2k* 39.76 c2k2x(V26) 40.66 0.90
financial1-1M 25000 cache2k* 52.64 c2k2x(V26) 52.74 0.10
financial1-1M 50000 cache2k* 54.17 c2k2x(V26) 54.41 0.24
financial1-1M 100000 cache2k* 54.55 c2k2x(V26) 54.69 0.13
financial1-1M 200000 cache2k* 54.79 c2k2x(V26) 55.47 0.68
scarab-recs 25000 cache2k* 68.65 c2k2x(V26) 69.18 0.53
scarab-recs 50000 cache2k* 74.07 c2k2x(V26) 74.90 0.83
scarab-recs 75000 cache2k* 77.13 c2k2x(V26) 78.20 1.08
scarab-recs 100000 cache2k* 79.25 c2k2x(V26) 80.40 1.15
loop 256 cache2k* 24.48 c2k2x(V26) 23.69 -0.79
loop 512 cache2k* 48.96 c2k2x(V26) 47.48 -1.48
zipf10K-1M 500 cache2k* 67.51 c2k2x(V26) 67.31 -0.20
zipf10K-1M 1000 cache2k* 74.46 c2k2x(V26) 74.20 -0.26
zipf10K-1M 2000 cache2k* 81.53 c2k2x(V26) 81.27 -0.27
web12 200 cache2k* 45.52 c2k2x(V26) 46.63 1.10
web12 300 cache2k* 52.35 c2k2x(V26) 53.25 0.90
web12 400 cache2k* 57.00 c2k2x(V26) 57.65 0.65
web12 800 cache2k* 66.63 c2k2x(V26) 66.78 0.15
web12 1200 cache2k* 70.65 c2k2x(V26) 70.80 0.15
web12 3000 cache2k* 78.15 c2k2x(V26) 78.46 0.31
oltp 2500 cache2k* 46.70 c2k2x(V26) 47.59 0.89
oltp 5000 cache2k* 53.32 c2k2x(V26) 53.87 0.55
oltp 10000 cache2k* 60.60 c2k2x(V26) 61.18 0.58

All real world traces benefit, while artificial traces (Zipfian and loop) do loose a bit. So this modification will be included in V2.6. Update: In the above results the V26 variant has a changed hot max setting from 97 to 94, which is responsible for the big differences. My bad.

cruftex commented 2 years ago

Corrected statistics with the V2.6 change without changing hot max:

Trace Name Cache Size Reference Hitrate Best Hitrate Diff
financial1-1M 12500 c2k2x(V24) 39.76 c2k2x(V26) 40.04 0.28
financial1-1M 25000 c2k2x(V24) 52.64 c2k2x(V26) 52.64 -0.00
financial1-1M 50000 c2k2x(V24) 54.17 c2k2x(V26) 54.35 0.18
financial1-1M 100000 c2k2x(V24) 54.55 c2k2x(V26) 54.59 0.04
financial1-1M 200000 c2k2x(V24) 54.79 c2k2x(V26) 54.79 0.00
scarab-recs 25000 c2k2x(V24) 68.65 c2k2x(V26) 68.78 0.13
scarab-recs 50000 c2k2x(V24) 74.07 c2k2x(V26) 74.23 0.16
scarab-recs 75000 c2k2x(V24) 77.13 c2k2x(V26) 77.31 0.18
scarab-recs 100000 c2k2x(V24) 79.25 c2k2x(V26) 79.42 0.17
loop 256 c2k2x(V24) 24.48 c2k2x(V26) 24.48 0.00
loop 512 c2k2x(V24) 48.96 c2k2x(V26) 48.96 0.00
zipf10K-1M 500 c2k2x(V24) 67.51 c2k2x(V26) 67.49 -0.02
zipf10K-1M 1000 c2k2x(V24) 74.46 c2k2x(V26) 74.39 -0.07
zipf10K-1M 2000 c2k2x(V24) 81.53 c2k2x(V26) 81.43 -0.11
web12 200 c2k2x(V24) 45.52 c2k2x(V26) 45.37 -0.16
web12 300 c2k2x(V24) 52.35 c2k2x(V26) 52.24 -0.11
web12 400 c2k2x(V24) 57.00 c2k2x(V26) 56.85 -0.15
web12 800 c2k2x(V24) 66.63 c2k2x(V26) 66.51 -0.12
web12 1200 c2k2x(V24) 70.65 c2k2x(V26) 70.55 -0.09
web12 3000 c2k2x(V24) 78.15 c2k2x(V26) 78.07 -0.08
oltp 2500 c2k2x(V24) 46.70 c2k2x(V26) 46.71 0.01
oltp 5000 c2k2x(V24) 53.32 c2k2x(V26) 53.41 0.09
oltp 10000 c2k2x(V24) 60.60 c2k2x(V26) 60.64 0.03

The effect is actual minimal, however, it has relevant savings in processing time, so I keep it. For the next release 2.8 I plan to do more eviction improvements and look into different hot max settings.