Bench adjustment discussion + adding mnemonist to bench

Yomguithereal commented 6 years ago

Hello everyone,

So, I drafted some lru cache implementation for the mnemonist library using some idea I had and that we can further discuss if you feel it worthy. (I added it to the benchmark so you can test it on your end)

However, I had one issue with the current benchmark being:

set operations are only "raw" the first time since the cache is the same for each run and this means having an outlier in the results (printing the array of times is a good way to see those outliers and increasing the number of evicts seems to flatten differences).

So, as I understand that benchmarking evictions is an important part of this (and indeed, "raw" sets are marginally interesting), I tried to change some things and test the results. I therefore tried to instantiate the cache for each run and doubled the number of evictions to avoid weird optimizations.

Those are the results I had (once again results on macbook + node v10):

Original

name	set	get1	update	get2	evict
tiny-lru	20683	21882	26631	24938	21053
lru_cache	18198	24301	7890	27473	16681
mnemonist	9606	77220	31201	80645	8745
simple-lru-cache	7413	36036	23447	37453	6601
lru-fast	4553	32206	26247	30912	5301
hashlru	6231	8333	4237	7030	4748
quick-lru	3342	3406	4314	3598	4490
js-lru	1561	5986	6935	8764	1836
lru	2102	4700	4113	4734	1742
secondary-cache	2255	8285	4900	7930	1569
hyperlru-object	1637	8885	14524	15492	1453
hyperlru-map	1005	5136	5378	5680	994
mkc	1004	1815	1131	1769	826
modern-lru	722	1482	1981	2277	753

True raw sets (meaning re-instantiating the cache each run)

name	set	get1	update	get2	evict
mnemonist	13661	65789	32310	80972	7025
tiny-lru	4833	27248	26385	26008	6969
hashlru	3331	5661	5047	7283	5713
quick-lru	3816	3067	2988	3225	4900
lru-fast	8217	32468	27435	29806	4040
simple-lru-cache	4617	11325	10081	19980	3824
lru_cache	3541	27739	15186	27100	2950
js-lru	2588	5713	5703	7339	1930
lru	3041	4697	4050	4684	1690
secondary-cache	2664	4091	3825	8969	1593
hyperlru-object	1470	8909	11710	10400	1590
hyperlru-map	1364	4769	4878	4394	970
mkc	1463	1855	1086	1935	805
modern-lru	1403	2286	1983	2355	756

True raw sets + doubled evicts

name	set	get1	update	get2	evict
mnemonist	12005	75758	31898	78125	2288
simple-lru-cache	3797	25510	9311	20243	1691
lru-fast	5893	29240	26385	30488	1565
tiny-lru	9132	26525	24876	25707	1423
quick-lru	3145	3408	3219	3348	1276
hashlru	5972	7596	3980	6741	1246
lru_cache	3739	21930	8985	22857	1156
js-lru	2922	5819	7516	7421	624
lru	2093	4006	3926	4119	588
secondary-cache	2369	5225	4700	8621	553
hyperlru-object	1698	8326	9116	13652	526
hyperlru-map	1197	5576	4562	4935	302
mkc	1218	1571	1047	1815	265
modern-lru	1455	2188	1940	2189	240

Sorry if I am still missing something obvious and if I messed things up while doing the benchmark.

Also, if I switch my implementation to Map, I become less efficient with numbers but more efficient with strings.

Yomguithereal commented 6 years ago

Any thoughts?

Kikobeats commented 6 years ago

For me looks good, what do you think @avoidwork 🙂

avoidwork commented 6 years ago

@Yomguithereal bad news; your test change simply exposes a JIT flaw.

If you change iterations from 2 to 4, performance of all libs get halved, up to a third. No other change needed for everything to slow down.

Now, if we simply add the new lib, it sits comfortably in number 3* spot (it crushes on the gets! wow!).

So, with that simple knowledge I say to close this PR, add the new lib in a new PR and we'll retain accurate results without JIT flaws coming into the picture.

fyi, Maps also tank as your approach 1 million keys, and cease to work properly if you exceed 1 million (today); I'm thinking you uncovered something similar.

avoidwork commented 6 years ago

Adding a picture 'cause the tables are a pain to copy/pasta... tests ran on an ubuntu 18.04 vm on a tr4 1950x with way more resources than are needed for this project.

First run = clone of the PR branch
Second run = reset everything & keeps new lib (master + new lib)
Third run = second run iteration doubled to 4, goes bad... (master + new lib + 4 iterations)

2018-06-26

Yomguithereal commented 6 years ago

Thanks for the answer @avoidwork. This JIT flaw is quite spectacular. Is this only due to the amount of memory used?

Do you want me to explain how I get to those numbers in get/update so we can see whether we can somehow merge our approaches to get the best of both?

avoidwork commented 6 years ago

@Yomguithereal I don't know about this flaw... this is the first time I've seen a large array iteration seemingly impact unrelated ops; might be a serialization flaw somewhere (seen with IPC years ago).

I'm going to look at your code, and I'll get back via priv message if I have any questions :)

dominictarr / bench-lru

Bench adjustment discussion + adding mnemonist to bench #17