dominictarr / bench-lru

MIT License
89 stars 11 forks source link

Update methodology to more closely match real-world workloads #41

Open isaacs opened 2 years ago

Yomguithereal commented 2 years ago

That is very interesting @isaacs. Thank you very much. It makes me wonder whether I should drop my array backing for storing values in favor of entries containing key,value,pointer_index directly into my maps to catch up on get performance (since I need an additional lookup into the value storage array). I would need to keep the entries into a static pool as usual to avoid GC and all but maybe this is worthwhile.

EDIT: probably not a good idea since it would mean more memory and roughly same computation time to keep the pool of entries. I wonder why my splaying on top is slightly slower than most.

isaacs commented 2 years ago

@Yomguithereal Yeah, I've been messing with lru-cache a bit the last few weeks. I didn't wanna look at mnemonist until I got to this point, because it was a fun challenge and I sorta felt like it would be giving up to look at the answers before trying to solve it myself 😂

$ ls -laF index-*.js
-rw-r--r--  1 isaacs  staff   4873 Jan 25 08:38 index-2022-01-24.js
-rw-r--r--  1 isaacs  staff  10502 Jan 29 14:34 index-2022-01-28.js
-rw-r--r--  1 isaacs  staff   5281 Jan 23 16:15 index-arraybuffer-malloc.js
-rw-r--r--  1 isaacs  staff   2087 Jan 26 15:41 index-bare.js
-rw-r--r--  1 isaacs  staff   3027 Jan 22 23:32 index-batch-shift.js
-rw-r--r--  1 isaacs  staff   3119 Jan 23 00:35 index-batchdelete-partial.js
-rw-r--r--  1 isaacs  staff   2731 Jan 23 08:40 index-best.js
-rw-r--r--  1 isaacs  staff   2798 Jan 23 08:56 index-bork2.js
-rw-r--r--  1 isaacs  staff   2758 Jan 22 22:33 index-borked-somehow-wtf.js
-rw-r--r--  1 isaacs  staff   2285 Jan 22 23:08 index-desymboled.js
-rw-r--r--  1 isaacs  staff   2887 Jan 22 23:44 index-doubleshift.js
-rw-r--r--  1 isaacs  staff  10502 Jan 26 15:40 index-featureful.js
-rw-r--r--  1 isaacs  staff   2974 Jan 23 13:38 index-last-linked-list-approach.js
-rw-r--r--  1 isaacs  staff   3906 Jan 22 22:04 index-ll-0.js
-rw-r--r--  1 isaacs  staff   2964 Jan 22 22:07 index-ll.js
-rw-r--r--  1 isaacs  staff   5765 Jan 23 22:38 index-malloc-ab-nodeletekey.js
-rw-r--r--  1 isaacs  staff   5273 Jan 23 16:59 index-malloc-ab-null.js
-rw-r--r--  1 isaacs  staff   5864 Jan 23 19:58 index-malloc-arraybuffer.js
-rw-r--r--  1 isaacs  staff   6413 Jan 23 19:39 index-malloc-one-arraybuffer.js
-rw-r--r--  1 isaacs  staff   6792 Jan 24 17:02 index-malloc-rotate-keys.js
-rw-r--r--  1 isaacs  staff   4863 Jan 23 15:59 index-malloc.js
-rw-r--r--  1 isaacs  staff   2241 Jan 22 22:07 index-map-tail.js
-rw-r--r--  1 isaacs  staff   4642 Jan 24 23:32 index-map.js
-rw-r--r--  1 isaacs  staff   4643 Jan 29 09:51 index-minimal.js
-rw-r--r--  1 isaacs  staff   4048 Jan 28 17:45 index-no-features.js
-rw-r--r--  1 isaacs  staff   3027 Jan 22 23:45 index-shiftn.js
-rw-r--r--  1 isaacs  staff   7342 Jan 24 22:08 index-smiindex-32bit.js
-rw-r--r--  1 isaacs  staff   7358 Jan 24 22:05 index-smiindex.js
-rw-r--r--  1 isaacs  staff   2860 Jan 22 22:43 index-symbols.js
-rw-r--r--  1 isaacs  staff   9360 Jan 30 08:16 index-v7.js
-rw-r--r--  1 isaacs  staff   2636 Jan 23 08:42 index-weakmap.js
-rw-r--r--  1 isaacs  staff   4954 Jan 29 13:56 index-with-dispose.js
-rw-r--r--  1 isaacs  staff   6094 Jan 29 14:24 index-with-ttl.js

(I of course know about git branches, but sometimes you just wanna throw stuff at the wall and benchmark a bunch of approaches against one another side by side, so I end up with this kind of mess.)

Eventually when I got performance to be comparable, I took a look at how mnemonist-map was working, and it's almost identical, but our heads and tails are swapped, which I found amusing :)

Yomguithereal commented 2 years ago

Is your folder available somewhere? I would love to take a look at all those implementations you tried.

Eventually when I got performance to be comparable, I took a look at how mnemonist-map was working, and it's almost identical, but our heads and tails are swapped, which I found amusing :)

Head or tail in doubly-linked lists is always a matter of opinion it seems :) Did you perchance try the old lru cache implementation trick that enable you to only keep pointer storage for forward or backward direction by storing the pointer to previous or next as value of the hashmap? I did not try it because it is a bit convoluted and I think it requires two hashmap lookup (which is probably slower) but I wonder if this could be efficient in JS for some reason. I think I mentioned this trick somewhere here: https://yomguithereal.github.io/posts/lru-cache

isaacs commented 2 years ago

id you perchance try the old lru cache implementation trick that enable you to only keep pointer storage for forward or backward direction by storing the pointer to previous or next as value of the hashmap?

I did try that, actually. It makes the initial set() performance significantly better, but get() performance drops, so it's only a good fit if you're caching things you never want to get back out, and in that case, you're better off just not caching. I only did a rough cut of it, so it's possible that some work could be done to make it faster, but it seems any extra Map.get calls really hurts.

If you can get away with knowing that your keys will always be integers within some reasonable range, you can get it going crazy fast by using an array as the key map. But that's probably too limiting to be useful. I tried using an object until a key is encountered that wouldn't perform well with an object (ie, a number that's not an integer, or a string over 256 chars, or any non-string/non-number), but the type checking was too costly to be worth it, and it added a lot of complexity. I think we're at a local maximum here.

https://gist.github.com/isaacs/c1fb014371902ed937a210135cf6a1c4

isaacs commented 2 years ago

Oh! Another surprising thing, it seems like at least in relatively recent versions of v8, there's really no difference in performance between an typed Uint##Array and a normal Array() that only contains ints within that range. I was thinking it might be fun to be able to use the same linked list in both compiled and JS code, somehow.

You wouldn't be able to maintain strict LRU semantics, but you could do something like lru-cache's isStale() to just drop deleted items as they are encountered, based on a flag in the typed array. I imagine writing into the key map or value/key lists would be tricky to do in a performant way though, because of the C++/JS boundary tax, so again that really limits the functionality if you're storing anything that can't be expressed as raw bytes.

Yomguithereal commented 2 years ago

Oh! Another surprising thing, it seems like at least in relatively recent versions of v8, there's really no difference in performance between an typed Uint##Array and a normal Array() that only contains ints within that range.

I had noticed indeed that in earlier versions, the more you go towards a Float64Array, the less you gain performance by relying on a byte array, which makes sense since the engine is probably able to use the same underlying structure to store relevant numbers. This said, it's cool to know the engine is now able to infer more about the contained numbers and optimize further. Another thing that should be tested is memory usage. In my experience, CPU time does not suffer too much, but there is still a memory cost to use a plain array vs. a byte array.

isaacs commented 2 years ago

Another thing that should be tested is memory usage. In my experience, CPU time does not suffer too much, but there is still a memory cost to use a plain array vs. a byte array.

Ah, that does make sense, since it has to be prepared to switch from int[] mode to object mode at any time.

I've found that for most cases in JS where you'd want to use an LRU, the most relevant concern is the time to fetch an object from a slow upstream source, so as long as memory can't expand indefinitely just "using a lot" is typically fine. (I note this is your reason for not bothering with the single-link-list-and-map approach; pointer space isn't really the issue.) If you don't have a ton of memory to spend on the performance and flexibility of an optimizing JIT, JS is not the language to use lol

Yomguithereal commented 2 years ago

If you don't have a ton of memory to spend on the performance and flexibility of an optimizing JIT, JS is not the language to use lol

Clearly yes :) But if you are in the browser the choice is often made for you (you can try webassembly but if you need to interact with the cache from JS, the communication cost remains too high to enhance performance. What's more, if you need to work outside of v8 (Firefox notably), using byte arrays and avoiding dynamic allocation and GC is able to yield spectacular perf boost since the engine does not need to think very much to optimize the runtime.

isaacs commented 2 years ago

@Yomguithereal Here's a gist with all the many index-*.js files, if you feel like poking around in there. It's mostly bad though 😅 https://gist.github.com/isaacs/7ce3c419788dc9b7b75aa6ef5c112550

Yomguithereal commented 2 years ago

Thanks @dominictarr I will peruse this with care :)

kibertoad commented 2 years ago

@dominictarr do you need any help with reviewing/merging this PR?

isaacs commented 1 year ago

@Yomguithereal By the way, idk if you've been continuing to test this, but in benchmarking a major refactor of lru-cache, I found that mnemonist's Map-based LRUMapWithDelete appears to be somewhat faster overall than LRUCacheWithDelete (the object-based one), in addition to being safe with non-string keys. Might be worth updating the docs to suggest using the Map version instead.

int: just an integer
| name               | set   | get1  | update | get2  | evict | score  |
|--------------------|-------|-------|--------|-------|-------|--------|
| mnemonist_0.39_map | 16949 | 41667 | 25000  | 31250 | 9174  | 386019 |
| lru-cache_CURRENT  | 12500 | 40000 | 14706  | 28571 | 9091  | 348016 |
| mnemonist_0.39_obj | 37037 | 12821 | 11364  | 12346 | 17857 | 274916 |

strint: stringified integer
| name               | set   | get1  | update | get2  | evict | score  |
|--------------------|-------|-------|--------|-------|-------|--------|
| mnemonist_0.39_map | 17857 | 76923 | 40000  | 50000 | 18868 | 650823 |
| lru-cache_CURRENT  | 18182 | 66667 | 35714  | 47619 | 13889 | 579619 |
| mnemonist_0.39_obj | 30303 | 12821 | 11364  | 11765 | 16129 | 249903 |

str: string that is not a number
| name               | set   | get1  | update | get2  | evict | score  |
|--------------------|-------|-------|--------|-------|-------|--------|
| mnemonist_0.39_map | 11628 | 32258 | 24390  | 27778 | 8850  | 327560 |
| lru-cache_CURRENT  | 11905 | 33333 | 23256  | 21739 | 7576  | 293640 |
| mnemonist_0.39_obj | 13889 | 11236 | 10101  | 10309 | 7246  | 159362 |

The set() time for the object version is faster for short strings, especially short integer strings, but in all other paths, it falls short considerably.

I haven't looked too closely, but I suspect this is due to updates to node's v8 version, since I definitely recall the object version being overall faster in these cases.

Yomguithereal commented 1 year ago

@isaacs nice to know v8 is finally optimizing Maps properly. I have two questions though:

  1. are you observing that the object version "lost" performance, or just that the map ones are just faster now?
  2. did you test the not WithDelete versions also? Or are you testing only those because your bench includes arbitrary deletion? The WithDelete versions must maintain a stack of free pointers and should consume O(n) more memory and should have a little performance overhead vs. the regular versions.

Might be worth updating the docs to suggest using the Map version instead.

I should do so indeed. Can you tell me from which node version the Map version seems to win?

isaacs commented 1 year ago

@Yomguithereal I haven't compared it exhaustively across Node versions. (My goal is to track performance across lru-cache versions within a given Node version.) But from my limited testing, the object version seems to still be about the same as it was (ie, exceptional with short integer strings, good with short non-numeric strings, ok with long strings, utterly abysmal with floats or float numeric strings), and lru-cache and mnemonist/lru-map-with-delete seem to have both gotten faster across the board. For some reason, recent LRUCache versions are getting slightly better performance than mnemonist LRUMapWithDelete with mixed type keys, but just slightly worse everywhere else, with the difference being pretty much only in set() and especially evict(). Afaict, the main difference between our implementations is that I do the eviction in a separate method and return this from the set() method, whereas mnemonist returns the evicted value, and lru-cache is doing a few boolean checks to see if TTL/size tracking/etc are enabled. I can see that the evict() method is being inlined, so it shouldn't be affecting performance, but even when I remove the feature checks, it still does about the same. 🤷 VMs are weird.

I haven't tested the no-delete version. Since lru-cache includes deletion, I tried to test against implementations with as close to feature parity as possible. But I don't think the free pointer tracking is a significant performance cost either, and the memory overhead is pretty minimal.

Can you tell me from which node version the Map version seems to win?

This was found testing with Node.js v18.latest.

isaacs commented 1 year ago

@Yomguithereal Lol, the results on GitHub Actions' linux VMs is quite a bit different from on my laptop :joy: https://isaacs.github.io/node-lru-cache/benchmark/

I think the main takeaway is that Map performance is much more consistent and predictable. I have no idea why that would be in these cases, but empirically, there it is.

Yomguithereal commented 1 year ago

This was found testing with Node.js v18.latest.

Ok, I'll indicate this as the tipping point on the docs. (I think I tested v16 and did not observe this shift yet).

That's the first time I think about the mixed return type of a collection method tanking the perf but it makes a lot of sense unfortunately. Maybe a method not returning the evicted value could help but is this worth it?

I have no idea why that would be in these cases, but empirically, there it is.

I would expect this with a finely crafted hashmap implementation that does not need to handle all the corners cases a JavaScript POJO has to keep in mind. Especially with contents that actually need a hashmap (not a struct-like content). V8 just required a little bit of time to make it happen I guess. That's good news :)

isaacs commented 1 year ago

That's the first time I think about the mixed return type of a collection method tanking the perf but it makes a lot of sense unfortunately. Maybe a method not returning the evicted value could help but is this worth it?

Ah, yes, I see, the issue is probably that the returned object shape is different between each call.

I think in practice, most caches have consistently structured data with a single key type, so the fact that lru-cache does better there actually reflects some mis-appropriated optimizations on my part.