Open sprutton1 opened 1 week ago
Hi, @sprutton1 . Thanks for reporting.
I've checked your obtain()
method usage. IIUC, if there are multiple concurrent obtain()
calls, and a serialized value is returned, all of the callers will deserialize the value and reinsert the deserialized value into the cache. The memory usage of the concurrent deserialization would cause OOM. Besides, each reinsertion will lead to a disk cache write, which would consume more memory than expected. (Currently, foyer writes the disk cache on insertion, not memory cache eviction.)
BTW, have you setup the admission picker for the disk cache? It would be helpful to provide the foyer configuration. 🙏
Apologies for the delay.
All of our configuration happens in the same file. We do the work here. The defaults are set here. Let me know if this gives you any insight.
To be more clear, it seems like the memory is growing continuously, not necessarily that we're bursting into an OOM situation. Here's an example screenshot showing growth over a few days.
I suppose we could introduce locking around the get calls to block when we get a serialized value so we only do that work a single time.
Hi, @sprutton1 . I found the admission rate limit is set to 1 GiB/s here:
May I ask if the argument is expected? It is a little larger for disks without PCIe 4.0 or nvme support.
For debugging, if you are using jemalloc in your project, you can use jeprof to generate a heap flamegraph. Related issues and PRs: #747 #748 (Not much information with the links, sorry about that)
And, is there any way to reproduce it locally? I can help debug. 🙌
One more thing, would you like to integrate the foyer metrics in your env? That would help debug.
We're attempting to use this projects as a replacement for our homegrown memory/disk cache built around Moka and Cacache. We're seeing an issue with memory growing unbound over time, eventually leading to the service going OOM. We've added measures to ensure we leave a percentage of the host OS memory always reserved. As far as I can tell, Foyer always reports that the memory used by the cache is within the limits.
Our current suspicion is around how we are using the
obtain
method here. Heaptrack implies that there is a memory leak in this function call.We have have complicated types that we cache that get serialized and gossiped around across services. To avoid repeated deserialization costs, when something is retrieved from the cache that is still serialized, we deserialize and insert the new value behind the same key before returning. It should be noted that the deserialized value will always be wrapped in an
Arc
.So, my questions are:
Arcs
in Foyer or do you think relying on the pointers you already manage is sufficient?CC @fnichol