Proposal to Integrate SIEVE Eviction Algorithm

yazhuo commented 9 months ago

Hi there,

Our team (@1a1a11a) has developed a new cache eviction algorithm, called SIEVE. It’s simple, efficient, and scalable.

Why SIEVE could be a great addition:

Simplicity: Integrating SIEVE is straightforward, usually needing to change less than 20 lines of code on average.
Efficiency: On skewed workloads, which are typical in web caching scenarios, SIEVE is top-notch.
Cache Primitive: SIEVE is not just another algorithm; it's a primitive that could enhance or replace LRU/FIFO queues in advanced systems like LeCaR, TwoQ, ARC, and S3-FIFO.

Welcome to dive into the details on our website sievecache.com and on our SIEVE blog.

We would love to explore the possibility of integrating SIEVE into mnemonist. We believe it could be a beneficial addition to the library and the community.

Looking forward to your feedback!

kurtextrem commented 9 months ago

In my benchmark: https://github.com/kurtextrem/js-sieve/blob/main/bench/index.mts#L78, modifying the LRU cache implementation to use new Map instead of an object, also makes it outperform my implementation. Code here: https://github.com/kurtextrem/js-sieve/blob/main/alt-impl/lru2.mts#L102. I took the reference code from @1a1a11a and fixed a few bugs, plus switched to Map and optimized some statements (typeof ... !== 'undefined' -> ... !== undefined). Feel free to use the code for a PR :)

1a1a11a commented 9 months ago

@kurtextrem Nice work! I am new to js, so I am not going to comment on this.

If @Yomguithereal is willing to take a PR, would you like to submit one?

kurtextrem commented 9 months ago

If @Yomguithereal is willing to take a PR, would you like to submit one?

Thank you! For sure, I'd be happy to submit a PR.

Yomguithereal commented 9 months ago

Hello @yazhuo, I have of course read your paper and was meaning to add both SIEVE and S3-FIFO to the library but I lacked time to do so for the time being.

Still, I have some open questions about how to better implement SIEVE, in JavaScript at least. The first part is related to the fact that you probably still need a doubly-linked list and cannot work with a simple ring buffer, as hinted in this lobste.rs conversation here. The second part is linked to the visited information per node. I am still not sure whether relying on a statically allocated bitset would be beneficial for performance and I need to benchmark it. What's more, using a uint8 byte array could simplify the code, enhance performance and provide an easy way to also implement something like a k-SIEVE as described here if required. But this would also cost 8 times the memory.

Regarding your paper (at least this one: https://junchengyang.com/publication/nsdi24-SIEVE.pdf), this statement:

Aside from mnemonist, which uses arrays, they all use doubly-linked-list-based implementations of LRU.

is incorrect. mnemonist also uses a doubly-linked list, even if it relies on statically allocated arrays to serve as a pointer system, as explained in this blog post.

You also state that it would requires 12 lines of modified code to adapt mnemonist lru cache code to implement SIEVE. Can you provide said lines as a basis for the reflexion to how better implement your algorithm? Is this code the one by @1a1a11a as hinted by @kurtextrem ? Is this here?

In my benchmark: https://github.com/kurtextrem/js-sieve/blob/main/bench/index.mts#L78, modifying the LRU cache implementation to use new Map instead of an object, also makes it outperform my implementation.

We will of course need both a SIEVECache and a SIEVEMap class here, since the results vary wildly based on the kind of keys you are using.

Thank you! For sure, I'd be happy to submit a PR.

@kurtextrem I am willing to accept your PR but I would like first to take the time to re-read the paper to understand whether it is possible or not to use something simpler that the doubly-linked list here (namely a ring buffer or some variation of gap buffer). I am also wondering whether to introduce some kind of pre-processing/code templating to the library because the code duplication between the Map/object version of the cache and the support for deletion starts to be a little bit annoying for me to maintain.

kurtextrem commented 9 months ago

I am still not sure whether relying on a statically allocated bitset would be beneficial for performance and I need to benchmark it

I didn't fully investigate, but from an early look, I do think the difference between "my" Sieve implementation and the modified Sieve implementation based on your code is, that your code does not instantiate nodes via class (e.g. avoids new Node on inserts, which also helps GC) by maintaining a key and value array. Meanwhile I use something like Node(key, value, visited) to store the visited bit. The key and value array implementation also makes the doubly-linked list implementation more efficient than the one I'm using (https://github.com/js-sdsl/js-sdsl/blob/main/src/container/SequentialContainer/LinkList.ts). So I do think maintaining a third bit array is beneficial here too.

For future ref, this is the Sieve-on-mnemoist-map implementation: https://github.com/kurtextrem/js-sieve/blob/main/alt-impl/lru2.mts This one is close to the original one: https://github.com/kurtextrem/js-sieve/blob/main/alt-impl/lru.mts (I needed to make some changes as the code committed to https://github.com/cacheMon/mnemonist did not seem to fully work). And this one is my optimized one based on the original code: https://github.com/kurtextrem/js-sieve/blob/main/alt-impl/lru3.mts

yazhuo commented 9 months ago

Hi @Yomguithereal, thanks for the reply!

The first part is related to the fact that you probably still need a doubly-linked list and cannot work with a simple ring buffer

You're right, SIEVE cannot be implemented by a simple ring buffer without shifting. Compared to CLOCK, SIEVE does lost this benefit. As the updates in Marc's blog mentioned, Toin Baker pointed out an interesting fix: "A simple fix to the SIEVE algorithm to accommodate circular arrays would be to move the current tail entry into the evicted entry’s slot (much like CLOCK copies a new entry into the evicted entry’s slot). " We need to further investigate how this would affect performance.

The second part is linked to the visited information per node.

A great point on implementation details about visited information. Indeed, we didn't benchmark the memory consumption in the paper; however, it is important to consider it in real cases! Besides, SIEVE-k was worse than SIEVE on real web KV workloads, so I think relying on a statically allocated bitset might be a better option.

mnemonist also uses a doubly-linked list, even if it relies on statically allocated arrays to serve as a pointer system, as explained in this blog post.

I've read your blog post - a great post and easy to follow! We will address this misstatement.

@kurtextrem Thanks for your good work on SIEVE implementation!

1a1a11a commented 9 months ago

Hi @Yomguithereal, Thank you for your interest! Add to @yazhuo's reply

The code is here, it was just a hack (not well engineered code) to see how easy it is to implement SIEVE.

IMHO, I don't think bitset would be a better option because of an extra random memory access, so uint8 would be a better option. And it is easier to implement. Implementing SIEVE on a ring buffer is non-trivial, and I have doubts about Toin's suggestion because it may cause aging problems (old objects stuck in the cache). But it is a good idea that should be tried.

If you want to go with a ring buffer, S3-FIFO might be a better option. :)

Yomguithereal commented 8 months ago

As the updates in Marc's blog mentioned, Toin Baker pointed out an interesting fix: "A simple fix to the SIEVE algorithm to accommodate circular arrays would be to move the current tail entry into the evicted entry’s slot (much like CLOCK copies a new entry into the evicted entry’s slot). " We need to further investigate how this would affect performance.

I did not put much thought into it but my intuition tells me this looks like a O(n) operation, and I don't have the feeling it would be easily amortized?

A great point on implementation details about visited information. Indeed, we didn't benchmark the memory consumption in the paper; however, it is important to consider it in real cases! Besides, SIEVE-k was worse than SIEVE on real web KV workloads, so I think relying on a statically allocated bitset might be a better option.

Fair enough. But putting k-SIEVE aside, my main concern here is performance because lookups in the bitset will be just a little slower than indexing a uint8 byte array counterpart, but I will of course benchmark this to make sure. I am not sure dividing the memory cost of a single list (one of at least 4) is worth the performance hit.

IMHO, I don't think bitset would be a better option because of an extra random memory access, so uint8 would be a better option. And it is easier to implement.

I read this after I started writing my answer :). Seems we agree indeed.

@yazhuo @1a1a11a related question, but I understand you are also the authors of S3-FIFO? What's your personal opinion regarding SIEVE vs S3-FIFO?

I will probably implement both at some point anyway.

1a1a11a commented 8 months ago

I did not put much thought into it but my intuition tells me this looks like a O(n) operation, and I don't have the feeling it would be easily amortized?

Yes, the original SIEVE design and the approach in Marc's blog have a worst case O(n) complexity during eviction, the amortized time complexity is O(1). However, there is a simple fix: force an eviction after checking k objects (k can be up to 100 if implemented using a ring buffer, but lower if using a linked list). This will have some impact on the miss ratio, but it will also make the algorithm more robust.

SIEVE vs S3-FIFO

From our experience on over 6000 traces collected from 2007 to 2023 in 12 companies, S3-FIFO is more robust than all other algorithms, including SIEVE (but it is also more complex than SIEVE). The one type of workload SIEVE does not work well is small and old blockI/O workloads. But on web workloads and large multi-tenanted block I/O workloads, SIEVE works very well. We still need better understanding. If you are interested, there are some results here https://observablehq.com/@1a1a11a/sieve-miss-ratio-plots

Yomguithereal / mnemonist

Proposal to Integrate SIEVE Eviction Algorithm #212

SIEVE vs S3-FIFO