dgilland / cacheout

A caching library for Python
https://cacheout.readthedocs.io
MIT License
425 stars 44 forks source link

Using cacheout in the context of multiprocessing #18

Closed sroui closed 3 years ago

sroui commented 3 years ago

Hi, I have used cacheout in my app, and it works like charm. but, later we have moved from threads to processes. The problem is cacheout is thread-safe not process safe, so I want to share an instance of CacheManger between all processes.

is it a good idea?

dgilland commented 3 years ago

If you want to share cached data between multiple processes with cacehout, then some options would be:

  1. Create a single cacheout instance in the main process and then using inter-process communications to send cache operation requests from the child processes to the main process and cache results from the main process back to the child processes.
  2. Create a shared data structure that could be used as the backend for multiple cacheout instances. I.e. create the shared data structure in the main process, pass it into the child processes, and then each child process uses that shared data structure as the backend for its cacheout instance.

Option 1 doesn't have to worry about any of the internals of cacheout and you'd only be dealing with one instance of it, but you would be creating your own sort of cache "server". Each child process would be treated as a cache client. You could potentially use multiprocessing.Pipe, one for each child process where the main process would then loop over each parent end of the pipe and check if there's a cache operation request. Then the main process cache "server" would be like an RPC call where the child sends some message that corresponds to a method call in the main process on the cacheout instance.

Option 2 would require creating a multiprocess version of OrderedDict so that it could be assigned to cacheout.Cache._cache. Then you'd just need to create the data structure instance in the main process and then share it with each child process so they could set it in their instance of cacheout.

mvanderlee commented 1 month ago

I solved this using UltraDict

from cacheout import Cache
from UltraDict import UltraDict

class CachableUltraDict(UltraDict):
    '''Allows us to use an UltraDict for cacheout.
        This allows us to have shared cross-worker gunicorn cache!
    '''

    def copy(self):
        return self.__class__(
            self,
            name=self.name,
            buffer_size=self.buffer_size,
            serializer=self.serializer,
            shared_lock=self.shared_lock,
            full_dump_size=self.full_dump_size,
            auto_unlink=self.auto_unlink,
            recurse=self.recurse,
            recurse_register=self.recurse_register,
        )

cache = Cache()
cache._cache = CachableUltraDict()