go-graphite / go-carbon

Golang implementation of Graphite/Carbon server with classic architecture: Agent -> Cache -> Persister
MIT License
805 stars 123 forks source link

Using cuckoo filter for new metric detection instead of cache #590

Closed deniszh closed 5 months ago

deniszh commented 5 months ago

Context:: Go-Carbon supports real-time indexing in trie index in Carbonserver. If the realtime-index parameter in config is > 0, it creates a special channel for a new metric with that size. Then cache.go populates that channel if the metric is missed in the cache. Then Carbonserver consumes this channel and populates the trie index from it during a file scan, creating index entries for the metric even if the whisper file doesn't exist.

Problem: Looks like the cache is not a good predictor for new metrics. When the cache is empty and there is a lot of incoming traffic, the file scan thread is blocked for a long time and the scan never finishes.

Solution: We can use a simpler structure than the cache (a map) to detect previously seen metrics. We can use bloom filters, which are good for this and have limited space. I use cuckoo filters, which are faster and support deletion.

I added cuckoo filter support to cache.go with tests. I also added support for the bloom-size parameter in the cache config. If > 0, the cache will use a bloom filter of a specified size to detect new metrics. I'm also doing deletion from the filter if a metric leaves the cache. I'm not sure if we need this, but it might help in case of long uptime.