Yomguithereal / mnemonist

Curated collection of data structures for the JavaScript/TypeScript language.
https://yomguithereal.github.io/mnemonist
MIT License
2.24k stars 92 forks source link

feat: add Sieve algorithm #220

Open kurtextrem opened 5 months ago

kurtextrem commented 5 months ago

As I've prepared the code based on the LRU data structure from mnemonist already for my own purposes, I figured I'd open a draft PR so anyone can take a look or do further improvements/iterations on the code (or tests with it).

As written in https://github.com/Yomguithereal/mnemonist/issues/212, in my benchmark, the Map() implementation comes out on top (with strings as keys).

Yomguithereal commented 5 months ago

Thanks @kurtextrem. I will merge your PR as soon as I start working on the SIEVE cache. It's just that I am swamped currently.

kurtextrem commented 5 months ago

no worries, take your time.

NeoPhi commented 4 months ago

@kurtextrem I'm probably doing something wrong in my benchmark. Running this implementation of SieveMap I'm seeing much worse hit ratio compared to your js-sieve implementation, my own simple Sieve implementation, or mnemonist/lru-map. Output of hit % on a sample K/V trace.

Sieve 94.69188501697717%
js-sieve 94.69199169100739%
SieveMap 86.2063926088917%
LRUMap 94.2229459801187%

Sample code for the above can be found here: https://github.com/NeoPhi/cache-playground

kurtextrem commented 4 months ago

@NeoPhi Cool stuff, thank you for doing this! From looking at your implementation, it looks like you're not using a doubly-linked list. From reading the Sieve paper I had the impression it's a must, but apparently it isn't? This will definitely also be much faster in execution than my js-sieve implementation (at least as long as hash collisions in the map stay unexpensive to resolve).

The differences in hit ratio are surprising to me, but I can't really tell you what's off. I took the example code referenced by the sieve paper and fixed some JS errors (https://github.com/Yomguithereal/mnemonist/issues/212#issuecomment-1877267684).

NeoPhi commented 4 months ago

I'm relying heavily on the iterator semantics of the JS Map. which has edge cases compared to a proper doubly linked list implementation, hence the small discrepancy in hit rates and why your implementation is slightly higher. (There was a bug in my hand implementation, hit count is identical).

I think the proposed code changes linked to from https://github.com/Yomguithereal/mnemonist/issues/212#issuecomment-1881797147 are buggy. I tried a simple use case and it looks like the internal data structures are getting corrupted:

import SieveMap from "../mnemonist/sieve-map.js";
const test = new SieveMap(3);
test.set('A', 'A');
test.set('B', 'B');
test.set('C', 'C');
console.log(Array.from(test.entries()));
// Output
// [ [ 'C', 'C' ], [ undefined, undefined ], [ undefined, undefined ] ]