Open bcmills opened 1 month ago
Note that even if the individual dict operations were protected by the GIL, there are read-modify-write sequences in the code that IIUC are not: https://github.com/aio-libs/async-lru/blob/dc601e2b288336f4511e3ec1afad22cf7b0ab04e/async_lru/__init__.py#L199-L214
Not sure if multiple loops is something we really want to care about, seems like a bad idea overall. But, could maybe prefix the cache per loop or something? {loop_id: {...}}
Having multiple loops seems more-or-less inevitable if someone is trying to incrementally adopt async
in a large existing codebase.
I would be reluctant to use a per-loop cache: it would depress the hit rate, and I'm not sure how you'd safely evict a loop from the cache to prevent a memory leak if it terminates.
I suppose you could use a threading.Lock
instead? If there's only a single event loop a non-blocking acquire will always succeed, and if you do end up needing to block I think you could run the acquire
in a separate thread (using loop.run_in_executor
on the current loop).
I believe we're running into this problem or something related after having recently adopted async-lru
.
We use a ProcessPoolExecutor to spin up processes which each have their own event loop. The event loop gets created and destroyed multiple times throughout the lifecycle of the process. We're seeing an issue where the cache result is sometimes empty for existing processes:
File "/home/[redacted]/.local/lib/python3.12/site-packages/async_lru/__init__.py", line 208, in __call__
return cache_item.fut.result()
^^^^^^^^^^^^^^^^^^^^^^^
asyncio.exceptions.CancelledError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/home/[redacted]/main.py", line 614, in <module>
success, failed = future.result()
^^^^^^^^^^^^^^^
File "/opt/python3.12/lib/python3.12/concurrent/futures/_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/opt/python3.12/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
asyncio.exceptions.CancelledError
Well, feel free to try the solution I mentioned and open a PR.
It appears that the
alru_cache
decorator is using an unsynchronizedOrderedDict
instance for its cache: https://github.com/aio-libs/async-lru/blob/dc601e2b288336f4511e3ec1afad22cf7b0ab04e/async_lru/__init__.py#L103If I am not mistaken, that will lead to a data race if the decorated function is called simultaneously from different
asyncio
event loops running on different threads — which seems to be a realistic (if rare) situation (compare https://github.com/python/cpython/issues/93462#issuecomment-1189390583).If I understand correctly, other
asyncio
wrappers generally bind the event loop during initialization and fail explicitly if called from a different event loop. Unfortunately, since thealru_cache
decorator generally runs at module init time there is no such loop on the thread.I don't have a clear idea of how this problem could be fixed; I just wanted to point it out as a (significant and subtle) risk for users of this module.