allegro / bigcache

Efficient cache for gigabytes of data written in Go.
http://allegro.tech/2016/03/writing-fast-cache-service-in-go.html
Apache License 2.0
7.51k stars 595 forks source link

The index(int) of entries(queue.BytesQueue) overflows the hashmap's uint32 value #354

Closed uchuck closed 1 year ago

uchuck commented 1 year ago

What is the issue you are having? When the entries of a cacheShard is large enough, set a new item into the entries, then get the item could fail with "not found" error.

What is BigCache doing that it shouldn't? The reason is when the entries of a cacheShard is large enough, set a new item into the entries could result in a large index, then this index is converted to uint32 since the hashmap takes uint32 as the value.

type cacheShard struct {
    hashmap     map[uint64]uint32
    entries     queue.BytesQueue
    lock        sync.RWMutex
    entryBuffer []byte
    onRemove    onRemoveCallback

    isVerbose    bool
    statsEnabled bool
    logger       Logger
    clock        clock
    lifeWindow   uint64

    hashmapStats map[uint64]uint32
    stats        Stats
}
if index, err := s.entries.Push(w); err == nil {
            s.hashmap[hashedKey] = uint32(index)
            s.lock.Unlock()
            return nil
        }

Minimal, Complete, and Verifiable Example This issue is found with a stress test when I set an item to the cacheShard, I got an index value "5576813569", but when this index was added to the hashmap, it's converted to uint32, which is "1281846273". Therefore, when I tried to get this item, I got this wrong index(1281846273) from the hashmap, then I got the wrong entry. Since the entry is wrong, the entryKey doesn't match with the actual key passed by get function, then BigCache considered this as a Collision and returned ErrEntryNotFound in the following code:

func (s *cacheShard) get(key string, hashedKey uint64) ([]byte, error) {
    s.lock.RLock()
    wrappedEntry, err := s.getWrappedEntry(hashedKey)
    if err != nil {
        s.lock.RUnlock()
        return nil, err
    }
    if entryKey := readKeyFromEntry(wrappedEntry); key != entryKey {
        s.lock.RUnlock()
        s.collision()
        if s.isVerbose {
            s.logger.Printf("Collision detected. Both %q and %q have the same hash %x", key, entryKey, hashedKey)
        }
        return nil, ErrEntryNotFound
    }
    entry := readEntry(wrappedEntry)
    s.lock.RUnlock()
    s.hit(hashedKey)

    return entry, nil
}

For more information on how to provide an MCVE, please see the Stack Overflow documentation.

Environment:

janisz commented 1 year ago

Looks like we should use uint64 in https://github.com/allegro/bigcache/blob/982ec3b09cacb9936b88dcfdaed329870065cf23/shard.go#L19 I think it will be backward compatible change. Would you like to contribute a patch with a test for this case?

uchuck commented 1 year ago

Looks like we should use uint64 in

https://github.com/allegro/bigcache/blob/982ec3b09cacb9936b88dcfdaed329870065cf23/shard.go#L19

I think it will be backward compatible change. Would you like to contribute a patch with a test for this case?

I was thinking the same change. However, it could potentially increase the total space of the BigCache in the case if people use it to store a huge number of small entries. The size of entries Queue won't change, but the size of map keys will be doubled. Will this be a concern for the original BigCache design?

janisz commented 1 year ago

That's true. The solution could be to change map type when we hit given threshold. I'm wondering if this issue is the root cause of https://github.com/allegro/bigcache/issues/290 Anyway, although change from 32 to 64 bits could double memory overhead I believe this footprint should be small enough that nobody will notice.

uchuck commented 1 year ago

Sure, then I can make a patch and test it.