allegro / bigcache

Efficient cache for gigabytes of data written in Go.
http://allegro.tech/2016/03/writing-fast-cache-service-in-go.html
Apache License 2.0
7.45k stars 593 forks source link

feat: enable mmap to alloc on unix-like system #397

Open tecty opened 2 months ago

tecty commented 2 months ago

Use mmap to malloc memory, give a smaller heap memory footprint. This will remove the go memory ballast effect while using bigcache. Thus this will increase a bit on GC stop and CPU. But we have other tools like GC params or gctuner to mitigate this kind of problem. image

tecty commented 2 months ago

The lowest go version on x/sys/unix is go1.17 and use v0.1.0 version, so I bump up the go version on this package.

tecty commented 2 months ago

By using mmap, we could nearly double the cache space with same memory limit. Because these space are not alloc from heap and not affect the GC behavior. Also this relief the maintenance issue on deployment. Because different cluster may have different memory quota, and size of cache will affect the GC param (to minimize p99 GC stop). By using mmap, we could just set the corresponding cache size according to memory quota.

janisz commented 2 months ago

By using mmap, we could nearly double the cache space with same memory limit.

That's great! However, incorporating an external dependency introduces differences in behavior between Unix and non-Unix systems. How about a different approach? Instead of adding a dependency, we could extend the code with an interface to handle allocation. By default, we would use the current behavior, but users could provide their own allocator. This approach would be similar to the hasher we currently use: by default, we use the hash from the stdlib, but users can provide their custom hasher. What do you think? We could then create a separate module with your allocator implementation and add a link in the README. In a separate repository, you could add more benchmarks and explain how it works, like a mini blog post.

tecty commented 2 months ago

First of all, this optimization cannot be addressed by relying on the user-implemented allocator. This is because the address space requested by mmap needs to be contiguous, and using mmap to perform realloc in this scenario is both unnecessary and too difficult for the user. Therefore, this optimization inevitably requires changing the behavior of the allocator.
Secondly, make and malloc fundamentally obtain memory from mmap, so allocating from mmap will not affect performance. Therefore, there is no need to conduct performance testing related to this, and no noticeable performance loss has been observed in the production environment.
Thirdly, if we do not use the official dependency package, porting in the behavior of mmap and manually performing system calls would introduce additional risks. Moreover, this approach would not solve the issue of changing the allocator's behavior.
Lastly, if we really need to consider dependency inversion, a more reasonable approach would be to treat the queue package as an interface and invert the dependency to the external environment. However, this could result in performance loss and a significant amount of code changes.