Open ethagnawl opened 1 week ago
Hi, thanks for the issue
The likely cause of this is the fact that the same unittest.mock._patch
object _HACK_CONTEXT
is being reused as a context manager on top of each itself. This leads to a race condition where is_local
has been deleted from the patch
object by the inner patch (see patch.__exit__
) while the outer patch still tries to access it (also in patch.__exit__
).
From unittest.mock._patch
(parts omitted for brevity):
def __exit__(self, *exc_info):
if self.is_local and self.temp_original is not DEFAULT: # Access here
...
del self.is_local # deleted from the object here
...
I think this can be fixed simply by creating a new patch object for each context. I released this fix in 0.10.4.
However, I don't know if this fully fixes the issue due to how query_optimizer.prefetch_hack.fetch_context
is currently implemented. It clears _PREFETCH_HACK_CACHE
when exiting the context, meaning the other context that requires it might not be optimized correctly. The prefetch hack is only required when using nested connection fields for many-to-many related fields, and this issue would only happen if multiple of those are queried in the same query.
Can you verify if this issue still applies in your context after the fix in 0.10.4? I'll investigate the latter issue and post an update later.
I wasn't able to replicate this in my testing. I created the scenario I described where we have a nested many-to-many connection field within another nested many-to-many connection field, but this didn't lead to any errors with the shared patch object. Added the tests to the repo still, maybe they'll catch a bug in the future.
Thanks for the prompt response and release, @MrThearMan!
I'm in the process of testing this now. Interestingly, I no longer see that exact error message but I do see another and I think it's (inscrutably) referencing the same line number as before. That may be a coincidence or it may be how the graphql library is constructing the errors included in the response.
{"errors":[{"message":"An unknown error occurred.","locations":[{"line":101,"column":3}],"path":["some_obj"]}],"data":{"some_obj":null}}
I'm going to make the same change I made earlier to the graphql library and see if it reveals anything new or different. I will follow up shortly.
After some more digging into graphl-core, I've determined that the following StopIteration
exception is being hit here and the exception contains an empty message, which explains that inscrutable (empty) error referenced above.
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/graphene_django/debug/middleware.py", line 65, in resolve
api_1 | result = next(root, info, **args)
api_1 | ^^^^^^^^^^^^^^^^^^^^^^^^
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/graphene/relay/node.py", line 106, in node_resolver
api_1 | return cls.get_node_from_global_id(info, id, only_type=only_type)
api_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/graphene/relay/node.py", line 131, in get_node_from_global_id
api_1 | return get_node(info, _id)
api_1 | ^^^^^^^^^^^^^^^^^^^
api_1 | File "/app/my_app/schemas/foos.py", line 112, in get_node
api_1 | return get_authorized_node(cls, info, id)
api_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api_1 | File "/app/my_app/schemas/__init__.py", line 809, in get_authorized_node
api_1 | maybe_instance = optimize_single(
api_1 | ^^^^^^^^^^^^^^^^
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/query_optimizer/compiler.py", line 62, in optimize_single
api_1 | fetch_in_context(queryset)
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/query_optimizer/prefetch_hack.py", line 96, in fetch_in_context
api_1 | return list(queryset) # the database query is executed here
api_1 | ^^^^^^^^^^^^^^
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/django/db/models/query.py", line 398, in __iter__
api_1 | self._fetch_all()
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/django/db/models/query.py", line 1883, in _fetch_all
api_1 | self._prefetch_related_objects()
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/django/db/models/query.py", line 1273, in _prefetch_related_objects
api_1 | prefetch_related_objects(self._result_cache, *self._prefetch_related_lookups)
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/django/db/models/query.py", line 2321, in prefetch_related_objects
api_1 | obj_list, additional_lookups = prefetch_one_level(
api_1 | ^^^^^^^^^^^^^^^^^^^
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/django/db/models/query.py", line 2463, in prefetch_one_level
api_1 | ) = prefetcher.get_prefetch_queryset(instances, lookup.get_current_queryset(level))
api_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/django/db/models/fields/related_descriptors.py", line 740, in get_prefetch_queryset
api_1 | queryset = _filter_prefetch_queryset(queryset, self.field.name, instances)
api_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api_1 | File "/usr/local/lib/python3.12/unittest/mock.py", line 1139, in __call__
api_1 | return self._mock_call(*args, **kwargs)
api_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api_1 | File "/usr/local/lib/python3.12/unittest/mock.py", line 1143, in _mock_call
api_1 | return self._execute_mock_call(*args, **kwargs)
api_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api_1 | File "/usr/local/lib/python3.12/unittest/mock.py", line 1204, in _execute_mock_call
api_1 | result = effect(*args, **kwargs)
api_1 | ^^^^^^^^^^^^^^^^^^^^^^^
api_1 | File "/home/appuser/.local/lib/python3.12/site-packages/query_optimizer/prefetch_hack.py", line 65, in _prefetch_hack
api_1 | cache: _PrefetchCacheType = next(iter(_PREFETCH_HACK_CACHE.values()))
api_1 | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
api_1 | StopIteration
I don't have any thoughts as to the why of this just yet ...
I made another attempt at fixing this in 0.10.5. This fix binds the required table relations for the prefetch hack in the context itself, so it shouldn't run into the same race condition where _PREFETCH_HACK_CACHE
has been cleared by one context while another tries to access it.
Thanks so much, @MrThearMan. At first blush, 0.10.5 does seem to solve the problem. I was just able to verify that a script spawning 24 processes which spam the server (Debian/Docker) does not result in the same behavior that I was seeing consistently prior to the latest release.
I will ask my colleagues who were seeing this issue on macOS to verify today.
I have read the docs thoroughly before making this bug report.
I have read through other open issues, and my issue is not a duplicate.
What version of the library you are using?
0.10.3
Which python version are you using?
3.12
What operating system are you on?
Ubuntu
Description
I have run into an edge case which may result from this project's use of
patch
in a way that's not threadsafe. It's also totally possible that this error masking or the result of an underlying application error.This occurs with some amount of predictability when using the Django development web server and spamming the server with many requests in parallel. It varies by environment but when running in Docker on MacOS, it's ~3 and in Docker on Debian/Ubuntu, it's ~9. I realize this will vary widely depending on the project, its resource usage, etc. but I share these details in order to provide as much useful context as possible.
I am not able to reproduce this error when using the Django development web server with the
--nothreading
flag or when using one or more Gunicorn workers running synchronously.Here's the unhandled exception:
That error was surfaced in detail by removing the error massaging from here and just re-raising the exception. If that error is "handled" by the graphql library, it eventually bubbles up to the client with an inscrutable error which looks like:
I will attempt to reproduce this using the example project in this repo later today and follow up.
Other potentially relevant environment details:
Python dependencies
Please let me know if there's any additional information I can provide.