dominictarr / bench-lru

MIT License
89 stars 11 forks source link

Bench adjustment discussion + adding mnemonist to bench #17

Closed Yomguithereal closed 5 years ago

Yomguithereal commented 6 years ago

Hello everyone,

So, I drafted some lru cache implementation for the mnemonist library using some idea I had and that we can further discuss if you feel it worthy. (I added it to the benchmark so you can test it on your end)

However, I had one issue with the current benchmark being:

set operations are only "raw" the first time since the cache is the same for each run and this means having an outlier in the results (printing the array of times is a good way to see those outliers and increasing the number of evicts seems to flatten differences).

So, as I understand that benchmarking evictions is an important part of this (and indeed, "raw" sets are marginally interesting), I tried to change some things and test the results. I therefore tried to instantiate the cache for each run and doubled the number of evictions to avoid weird optimizations.

Those are the results I had (once again results on macbook + node v10):

Original

name set get1 update get2 evict
tiny-lru 20683 21882 26631 24938 21053
lru_cache 18198 24301 7890 27473 16681
mnemonist 9606 77220 31201 80645 8745
simple-lru-cache 7413 36036 23447 37453 6601
lru-fast 4553 32206 26247 30912 5301
hashlru 6231 8333 4237 7030 4748
quick-lru 3342 3406 4314 3598 4490
js-lru 1561 5986 6935 8764 1836
lru 2102 4700 4113 4734 1742
secondary-cache 2255 8285 4900 7930 1569
hyperlru-object 1637 8885 14524 15492 1453
hyperlru-map 1005 5136 5378 5680 994
mkc 1004 1815 1131 1769 826
modern-lru 722 1482 1981 2277 753

True raw sets (meaning re-instantiating the cache each run)

name set get1 update get2 evict
mnemonist 13661 65789 32310 80972 7025
tiny-lru 4833 27248 26385 26008 6969
hashlru 3331 5661 5047 7283 5713
quick-lru 3816 3067 2988 3225 4900
lru-fast 8217 32468 27435 29806 4040
simple-lru-cache 4617 11325 10081 19980 3824
lru_cache 3541 27739 15186 27100 2950
js-lru 2588 5713 5703 7339 1930
lru 3041 4697 4050 4684 1690
secondary-cache 2664 4091 3825 8969 1593
hyperlru-object 1470 8909 11710 10400 1590
hyperlru-map 1364 4769 4878 4394 970
mkc 1463 1855 1086 1935 805
modern-lru 1403 2286 1983 2355 756

True raw sets + doubled evicts

name set get1 update get2 evict
mnemonist 12005 75758 31898 78125 2288
simple-lru-cache 3797 25510 9311 20243 1691
lru-fast 5893 29240 26385 30488 1565
tiny-lru 9132 26525 24876 25707 1423
quick-lru 3145 3408 3219 3348 1276
hashlru 5972 7596 3980 6741 1246
lru_cache 3739 21930 8985 22857 1156
js-lru 2922 5819 7516 7421 624
lru 2093 4006 3926 4119 588
secondary-cache 2369 5225 4700 8621 553
hyperlru-object 1698 8326 9116 13652 526
hyperlru-map 1197 5576 4562 4935 302
mkc 1218 1571 1047 1815 265
modern-lru 1455 2188 1940 2189 240

Sorry if I am still missing something obvious and if I messed things up while doing the benchmark.

Also, if I switch my implementation to Map, I become less efficient with numbers but more efficient with strings.

Yomguithereal commented 6 years ago

Any thoughts?

Kikobeats commented 6 years ago

For me looks good, what do you think @avoidwork 🙂

avoidwork commented 6 years ago

@Yomguithereal bad news; your test change simply exposes a JIT flaw.

If you change iterations from 2 to 4, performance of all libs get halved, up to a third. No other change needed for everything to slow down.

Now, if we simply add the new lib, it sits comfortably in number 3* spot (it crushes on the gets! wow!).

So, with that simple knowledge I say to close this PR, add the new lib in a new PR and we'll retain accurate results without JIT flaws coming into the picture.

fyi, Maps also tank as your approach 1 million keys, and cease to work properly if you exceed 1 million (today); I'm thinking you uncovered something similar.

avoidwork commented 6 years ago

Adding a picture 'cause the tables are a pain to copy/pasta... tests ran on an ubuntu 18.04 vm on a tr4 1950x with way more resources than are needed for this project.

  1. First run = clone of the PR branch
  2. Second run = reset everything & keeps new lib (master + new lib)
  3. Third run = second run iteration doubled to 4, goes bad... (master + new lib + 4 iterations)

2018-06-26

Yomguithereal commented 6 years ago

Thanks for the answer @avoidwork. This JIT flaw is quite spectacular. Is this only due to the amount of memory used?

Do you want me to explain how I get to those numbers in get/update so we can see whether we can somehow merge our approaches to get the best of both?

avoidwork commented 6 years ago

@Yomguithereal I don't know about this flaw... this is the first time I've seen a large array iteration seemingly impact unrelated ops; might be a serialization flaw somewhere (seen with IPC years ago).

I'm going to look at your code, and I'll get back via priv message if I have any questions :)