Adaptive cache eviction

cruftex commented 5 years ago

The cache2k clock-pro algorithm runs with fixed sizes of data structures that yielded the best results over all benchmark trace I could get a hold on, see: https://cruftex.net/2016/05/09/Java-Caching-Benchmarks-2016-Part-2.html

Nevertheless there are some patterns that result in worse hit rates than LRU.

Ben is experimenting with some adaptions, that look interesting. See: https://github.com/ben-manes/caffeine/issues/106

re-thc commented 5 years ago

Caffeine has released with the latest adaptive algorithm. Do you know how it compares to Cache2k now?

cruftex commented 5 years ago

@hc-codersatlas:

You can run the benchmarks with traces as explained in https://cruftex.net/2016/05/09/Java-Caching-Benchmarks-2016-Part-2.html

Be aware that this is rather academic and any result may or may not apply to your actual workload.

I will take a deeper look on this later in time. The goal of cache2k is to have good enough eviction results with very low overhead in the critical access path. That said, the goal is not having better eviction results then any other cache. In that sense, I see cache2k well balanced out.

cruftex commented 5 years ago

The last days I've been working quite a bit on the eviction. For the tests of the adaption algorithm of Caffeine Ben was using a trace that leads to quite devastating hitrates in cache2k. I improved the testbed for eviction algorithms in cache2k-benchmark so its possible to run an isolated version of the eviction algorithms and reproduced the results:

Trace	Size	cache2k v1.2	Caffeine 2.8
corda-loop-corda	512	0.176	42.875
corda-loop-corda	1024	2.275	75.521
corda-small	512	0.682	30.431
corda-small	1024	6.808	28.807
corda-small-10x	5120	0.604	30.456
corda-small-10x	10240	5.659	28.751
loop	256	0.000	23.928
loop	512	0.000	49.698

The trace corda-small-10x repeats each access 10 times in a different number space.

One reason for the outcome is, that the current eviction algorithm is keeping not more referenced hot entries too long. The removal of the demotion from hot to cold space from the original Clock-Pro algorithm shows good results. It seams that in combination with the hit counter this is not necessary, so in the end more frequently used items are preferred over items that have a reuse distance that make them hop between hot and cold. The overall eviction efficiency is improved and the list operations in the the eviction algorithm is drastically reduced. It's still not adaptive, so its not a solution for best performance on the corda trace, which is LRU friendly.

I plan to release what I have now, which looks like a good improvement increment as in upcoming version 1.4. Here is the comparison:

Trace	Size	cache2k v1.2	cache2k v1.4
corda-loop-corda	512	0.176	1.547
corda-loop-corda	1024	2.275	11.499
corda-small	512	0.682	4.505
corda-small	1024	6.808	32.921
corda-small-10x	5120	0.604	3.692
corda-small-10x	10240	5.659	32.863
loop	256	0.000	24.481
loop	512	0.000	48.962

On bigger and realistic workloads it looks like this:

Trace	Size	cache2k v1.2	cache2k v1.4	Caffeine 2.8
financial1	50000	47.093	47.390	47.253
financial1	100000	48.553	48.778	49.811
oltp	5000	53.255	53.341	55.320
oltp	10000	60.993	61.070	59.492
web12	3000	78.242	78.166	75.648
web12	1200	70.497	70.641	68.876
wikipedia1	1024	59.834	59.770	57.272
wikipedia1	2048	62.017	61.940	60.009

Except for corda at the small cache size of 512 entries, the to be released cache2k v1.4 is within a range of -3,5 and +2,5 hitrate difference compared to Caffeine 2.8. The static tuning, which comes with cache2k currently, favors more skewed distributions that will be found in typical web traffic, but it is less favorable for transaction oriented workloads.

ben-manes commented 5 years ago

fyi - I cannot reproduce the loop improvements in 1.3.1.Alpha at 256/512, but I do see the improvement to corda-small. Can you verify that we're using the same Lirs loop trace file and that my configuration is valid? If I printout the hit rate from cache.getStatistics().getHitRate() it matches my calculation when comparing with multi1. The ClockProPlusEviction seems to be updated, verified jars, etc. Maybe the improvement was lost during the refactoring?

cruftex commented 5 years ago

I double checked. I missed to take over a tiny change from the testbed version to production. I will do a new release tomorrow. My bad. Release early, release often.... Thanks @ben-manes !

cruftex commented 2 years ago

V2.6 gets a tiny chance because I discovered a not optimal behavior. The current eviction algorithm (V1.x - V2.4) inserts entries evicted from hot into the ghost history. However, that makes no sense, because the entry will hardly have better reuse distance next time. The original Clock-Pro algorithm is also not doing this. Here is the result when compared to V2.4.1.Final:

Trace Name	Cache Size	Reference	Hitrate	Best	Hitrate	Diff
financial1-1M	12500	cache2k*	39.76	c2k2x(V26)	40.66	0.90
financial1-1M	25000	cache2k*	52.64	c2k2x(V26)	52.74	0.10
financial1-1M	50000	cache2k*	54.17	c2k2x(V26)	54.41	0.24
financial1-1M	100000	cache2k*	54.55	c2k2x(V26)	54.69	0.13
financial1-1M	200000	cache2k*	54.79	c2k2x(V26)	55.47	0.68
scarab-recs	25000	cache2k*	68.65	c2k2x(V26)	69.18	0.53
scarab-recs	50000	cache2k*	74.07	c2k2x(V26)	74.90	0.83
scarab-recs	75000	cache2k*	77.13	c2k2x(V26)	78.20	1.08
scarab-recs	100000	cache2k*	79.25	c2k2x(V26)	80.40	1.15
loop	256	cache2k*	24.48	c2k2x(V26)	23.69	-0.79
loop	512	cache2k*	48.96	c2k2x(V26)	47.48	-1.48
zipf10K-1M	500	cache2k*	67.51	c2k2x(V26)	67.31	-0.20
zipf10K-1M	1000	cache2k*	74.46	c2k2x(V26)	74.20	-0.26
zipf10K-1M	2000	cache2k*	81.53	c2k2x(V26)	81.27	-0.27
web12	200	cache2k*	45.52	c2k2x(V26)	46.63	1.10
web12	300	cache2k*	52.35	c2k2x(V26)	53.25	0.90
web12	400	cache2k*	57.00	c2k2x(V26)	57.65	0.65
web12	800	cache2k*	66.63	c2k2x(V26)	66.78	0.15
web12	1200	cache2k*	70.65	c2k2x(V26)	70.80	0.15
web12	3000	cache2k*	78.15	c2k2x(V26)	78.46	0.31
oltp	2500	cache2k*	46.70	c2k2x(V26)	47.59	0.89
oltp	5000	cache2k*	53.32	c2k2x(V26)	53.87	0.55
oltp	10000	cache2k*	60.60	c2k2x(V26)	61.18	0.58

All real world traces benefit, while artificial traces (Zipfian and loop) do loose a bit. So this modification will be included in V2.6. Update: In the above results the V26 variant has a changed hot max setting from 97 to 94, which is responsible for the big differences. My bad.

cruftex commented 2 years ago

Corrected statistics with the V2.6 change without changing hot max:

Trace Name	Cache Size	Reference	Hitrate	Best	Hitrate	Diff
financial1-1M	12500	c2k2x(V24)	39.76	c2k2x(V26)	40.04	0.28
financial1-1M	25000	c2k2x(V24)	52.64	c2k2x(V26)	52.64	-0.00
financial1-1M	50000	c2k2x(V24)	54.17	c2k2x(V26)	54.35	0.18
financial1-1M	100000	c2k2x(V24)	54.55	c2k2x(V26)	54.59	0.04
financial1-1M	200000	c2k2x(V24)	54.79	c2k2x(V26)	54.79	0.00
scarab-recs	25000	c2k2x(V24)	68.65	c2k2x(V26)	68.78	0.13
scarab-recs	50000	c2k2x(V24)	74.07	c2k2x(V26)	74.23	0.16
scarab-recs	75000	c2k2x(V24)	77.13	c2k2x(V26)	77.31	0.18
scarab-recs	100000	c2k2x(V24)	79.25	c2k2x(V26)	79.42	0.17
loop	256	c2k2x(V24)	24.48	c2k2x(V26)	24.48	0.00
loop	512	c2k2x(V24)	48.96	c2k2x(V26)	48.96	0.00
zipf10K-1M	500	c2k2x(V24)	67.51	c2k2x(V26)	67.49	-0.02
zipf10K-1M	1000	c2k2x(V24)	74.46	c2k2x(V26)	74.39	-0.07
zipf10K-1M	2000	c2k2x(V24)	81.53	c2k2x(V26)	81.43	-0.11
web12	200	c2k2x(V24)	45.52	c2k2x(V26)	45.37	-0.16
web12	300	c2k2x(V24)	52.35	c2k2x(V26)	52.24	-0.11
web12	400	c2k2x(V24)	57.00	c2k2x(V26)	56.85	-0.15
web12	800	c2k2x(V24)	66.63	c2k2x(V26)	66.51	-0.12
web12	1200	c2k2x(V24)	70.65	c2k2x(V26)	70.55	-0.09
web12	3000	c2k2x(V24)	78.15	c2k2x(V26)	78.07	-0.08
oltp	2500	c2k2x(V24)	46.70	c2k2x(V26)	46.71	0.01
oltp	5000	c2k2x(V24)	53.32	c2k2x(V26)	53.41	0.09
oltp	10000	c2k2x(V24)	60.60	c2k2x(V26)	60.64	0.03

The effect is actual minimal, however, it has relevant savings in processing time, so I keep it. For the next release 2.8 I plan to do more eviction improvements and look into different hot max settings.

cache2k / cache2k

Adaptive cache eviction #101