Closed boschb closed 7 years ago
I think you're right that a cache wouldn't make for a good queue. CLHM for instance is LRU only, since while FIFO is much easier it has a worse hit rate.
I think you would need to clarify your exact needs. For example if the queue can evict if over capacity, then that presumes that it is non-blocking and always accepts writes. Does expiration need to be handled promptly or can it be delayed until polled by the consumer?
If we take the easiest cases then this could be implemented as a ring buffer. When the buffer it full it wraps around, swaps the existing value, and sends it to a listener. When polling, the call can check if the element has expired and, if so, retry. The logic for a concurrent non-blocking ring buffer is fairly simple, so I imagine adding these enhancements wouldn't be that difficult.
Of course that's the best case. I do think you'll want to spin your own and I'd be happy to advise if you get stuck.
Expiration could be delayed (in my case) until an poll/write is made.
Thanks! i'll look into the concurrent ring buffer. That sounds (almost) like what I have so far. Just need to add time based eviction.
Thanks again!
great! Note that my implementation is more low-level (Unsafe, false sharing) than you probably want. You'll probably want to use atomics, keep it simpler, and not worry about those extreme cases.
Closing, but we can keep discussing if helpful.
A Cache would work @boschb, but an eviction listener would normally not give you perfect FIFO ordering. You can get close though, by implementing your own eviction algorithm. Triava Cache allows custom eviction implementation, and you could use insertion time and tiebreaker to get very close to FIFO.
Your implementation could look about this (tiebreaker is by default automatically applied):
public class FifoEvictor<K,V> extends FreezingEvictor<K,V> {
long getFreezeValue(Integer key, TCacheHolder<Integer> holder) {
return holder.getCreationTime();
}
}
See https://github.com/trivago/triava/blob/master/src/examples/java/com/trivago/examples/CacheJSR107Example.java#L178 for a more complete example.
I am using such a solution with about 1000 writes per second on https://www.trivago.com/ . Please be aware that (as usual) eviction listeners can block adding entries to the queue. If you do not want writes to block, you could discard if it is acceptable in your scenario.
That handles eviction order, but what about the consumer(s) polling? Wouldn't that degrade into a full scan on every offer/poll?
I think we're going to struggle helping Bobby without more insight into his problem, acceptable trade offs, etc. I read the question as a Queue like class, but you might be right if it's more of a cache. My answer doesn't dig into the single or multiple producer / consumer contract. That can have a big impact on a concurrent algorithm (hint see JCTools). We might both be wrong and a much better solution exists because we jumped from the posed answer and didn't ask about the reason it's needed.
Hopefully he's already on the right track and we aren't making it worse with partial answers.
Quick fyi. If the problem is MPMC then the PTLQueue style algorithm might be the easiest to adapt. Then use a Turn
object for an empty slot and catch up (evict) when a producer fails to acquire its slot. This way the writeCount position is an increment and there is less weird races on consumer as you can CAS its counter for acquisition and have an slot instance to manage.
Obviously things get confusing quickly which is why I'm hesitant going to deep into any suggestion.
About the consumer(s) polling. The consumers - as I understand that word - are the eviction listeners. As a listener, they do not do a full scan. The full scan exists, but this is done by the background eviction thread. Neither Threads writing to the Cache nor consumers take any penalty due to the background work (no blocking, and not even amortized CPU costs). In general I understand that boschb is now looking into the concurrent ring buffer you proposed. Without knowing the exact use case I think the ring buffer is the most appropriate way.
Hey guys,
Thanks for the great discussion. I am trying to follow all the terminology but i'm no expert in this field, you guys are pros!
Here is what I ended up doing. I will profile this eventually and maybe make optimizations then but here is where it is now. It's necessary to know the following: 1.) I will have a lot of these ListeningEvictingQueues managed by a parent class. 2.) These queues are probably going to be of size 16-128. 3.) I also have another single Cache that holds the super set of all entries. This Cache has time (and weight) based eviction properties and a listener that calls evict() when removed. For completeness the Consumers also removes from this master Cache. This is going to be how I limit the impact on memory of all elements traversing all queues.
What this is used for is what I call a SourceBalancer, where there are 'sources' and 'targerts'. Imagine many sources producing objects at different rates and sizes. Every target will have a Map of source->ListeningEvictingQueue and will round robin pick from each. This will produce a 'fair' sampling of all sources objects to the target without worry of one source dominating the flow.
I'm sure I can do better, but this was a small amount of code that I don't have to worry too much about getting something wrong. Let me know your thoughts though on this approach, I'd love to hear them.
Thanks again, Bobby
On Fri, Sep 8, 2017 at 5:17 AM, Christian Esken notifications@github.com wrote:
About the consumer(s) polling. The consumers - as I understand that word - are the eviction listeners. As a listener, they do not do a full scan. The full scan exists, but this is done by the background eviction thread. Neither Threads writing to the Cache nor consumers take any penalty due to the background work (not even amortized costs). In general I understand that boschb is now looking into the concurrent ring buffer you proposed. Without knowing the exact use case I think the ring buffer is the most appropriate way.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ben-manes/caffeine/issues/183#issuecomment-328087873, or mute the thread https://github.com/notifications/unsubscribe-auth/ADU1JbbHlAxu8B-UZO_gfxxJwmWtpdexks5sgTBigaJpZM4PKd8g .
@boschb Your evicting queue looks pretty good. Its definitely best to start at the simplest, iterate, and profile. Sometimes I get too deep into the woods and jump ahead to complicated solutions, e.g. the more custom circular queue. It sounds like any minor races a consumer has with the an evicting producer (causing unnecessary eviction) isn't a real problem. Since you'll have a bunch of them that problem and contention are unlikely to occur very often due to the nice balancing you described. You could probably switch to ArrayBlockingQueue
if the linked node churn adds GC pressure.
@christian-esken I think you might find my hierarchical timer wheel interesting, which is kind of like an incremental bucket sort. This is how Caffeine supports custom expiration policies, e.g. FIFO or by an external timestamp, in amortized O(1) time. That way it doesn't suffer from a O(lg n) incremental sort, periodic O(n) expiration scan, or leave pollution by relying on a size eviction setting. I don't think it could adapt to your caching approach, but you might find the technique interesting regardless.
Good point on the Array vs Linked. I had originally had the Array version in the file but I think I moved it so that evict() would be more efficient. I will revisit this once I have some numbers, thanks for the +1 though, makes me feel a little better about the whole thing :)
Hey Ben,
First off thanks for the excellent cache. It's proving wonderful in our use still.
I'm looking to build a Queue that has Eviction properties like max_size and max_age as well as a listener for eviction. These features exist in a Cache, but lack a linked list nature of ordering. I've been racking my head a little on a good way to do this with a Cache but i think i may just have to spin my own unless you have any great ideas?
I noticed: https://github.com/ben-manes/concurrentlinkedhashmap but it doesn't seem to support time based eviction.
Any advice? Thanks! Bobby