encode / django-rest-framework

Web APIs for Django. 🎸
https://www.django-rest-framework.org
Other
28.21k stars 6.81k forks source link

Attribute Errors raised in Parser Silently Ignored #9433

Closed james-mchugh closed 2 months ago

james-mchugh commented 3 months ago

Discussed in https://github.com/encode/django-rest-framework/discussions/9426

Originally posted by **james-mchugh** June 3, 2024 Hello. First, thank you for the hard work on DRF. It has been great workin with it. There seems to be an issue where an `AttributeError` raised in a parser is being swallowed by the `Request` class's `__getattr__` method. I've noticed a similar issue reported previously (#4896), but it was closed as the reporter associated with an error from a separate package. I encountered this because I wrote the server and tested using Python 3.11, and another developer attempted to test using 3.10. My custom parser used a function that was not available in Python 3.10 (hashlib.file_digest), which caused an `AttributeError` when hitting certain endpoints. Instead of seeing a 500 response and a traceback in the server logs, we were getting a 200 response with an empty dictionary in the response body. We were scratching our heads for a while, as there was no indication anything was going wrong. ### Environment OS: Darwin 23.4.0 Darwin Kernel Version 23.4.0 x86_64 Python: 3.10 + 3.11 DRF: 3.15.1 Django: 5.0.6 ### Expected Behavior 500 Internal server error response and traceback in server logs. ### Actual Behavior 200 response and no errors in server logs. ```shell $ curl --json '{"foo": "bar"}' http://localhost:8000/foos/ {} ``` ```text [04/Jun/2024 02:59:15] "POST /foos/ HTTP/1.1" 200 2 ``` ### Code to Reproduce ```python from rest_framework import viewsets, response, parsers, routers class BrokenParser(parsers.JSONParser): def parse(self, stream, media_type=None, parser_context=None): raise AttributeError class TestViewSet(viewsets.ViewSet): parser_classes = (BrokenParser,) def create(self, request, **kwargs): return response.Response(request.data) router = routers.DefaultRouter() router.register("foos", TestViewSet, "foo") urlpatterns = router.urls ``` ### Investigation This appears to be happening because accessing the `data` property lazily parses data. If the parsing raises an `AttributeError`, this is raised up to https://github.com/encode/django-rest-framework/blob/master/rest_framework/request.py#L359 where `self._full_data` is set to an empty dictionary before the error is re-raised. This error then raises up and causes the attribute access to fallback to the `Request.__getattr__` method via Python's [data model](https://docs.python.org/3/reference/datamodel.html#object.__getattribute__). Here, the `data` attribute is attempted to be retrieved from the `WSGIRequest` stored in `Request._request`, and once again, this raises an `AttributeError` which is caught by https://github.com/encode/django-rest-framework/blob/master/rest_framework/request.py#L423. From here, the original `Request.__getattribute__` handling runs again. Now when the `data` getter runs, `self._full_data` is already set to an empty dictionary which is then returned. In the end, an empty response is returned and the error is silently ignored. Noob question, but why does `Request.__getattr__` call `self.__getattribute__` when an `AttributeError` occurs? Based on the Python Data model linked above, `__getattr__` should only run if `__getattribute__` fails in the first place, so by the time `__getattr__` runs, `__getattribute__` already tried and failed.
sevdog commented 3 months ago

Noob question, but why does Request.__getattr__ call self.__getattribute__ when an AttributeError occurs? Based on the Python Data model linked above, __getattr__ should only run if __getattribute__ fails in the first place, so by the time __getattr__ runs, __getattribute__ already tried and failed

This is the behaviour of __getattr__.

The issue is with this method overridden it intercepts and hides every AttributeError which may be raised in any other method/attribute.

That wrapper was previously implemented using __getattribute__ in d13c807616030b285589cec2fddf4e34a8e22b4a and then re-ported to __getattr__ in #5617 (see also #5576).

It seems that also this implementation has its own problems since it is going to mask every AttributeError even if not raised by the Request instance.

sevdog commented 3 months ago

either getattribute() raises an AttributeError because name is not an instance attribute or an attribute in the class tree for self; or get() of a name property raises AttributeError

Maybe it is also a problem with python itself, because it is invoking this method when the AttributeError.obj is not the Request object. 🤔

james-mchugh commented 3 months ago

Noob question, but why does Request.__getattr__ call self.__getattribute__ when an AttributeError occurs? Based on the Python Data model linked above, __getattr__ should only run if __getattribute__ fails in the first place, so by the time __getattr__ runs, __getattribute__ already tried and failed

This is the behaviour of __getattr__.

The issue is with this method overridden it intercepts and hides every AttributeError which may be raised in any other method/attribute.

That wrapper was previously implemented using __getattribute__ in d13c807 and then re-ported to __getattr__ in #5617 (see also #5576).

It seems that also this implementation has its own problems since it is going to mask every AttributeError even if not raised by the Request instance.

I'm still not sure that this explains why __getattr__ calls __getattribute__ though. That makes the attribute access flow as follows:

  1. Access attribute (drf.Request().foo)
  2. drf.Request.__getattribute__ runs to retrieve value for attribute.
  3. If it fails with an AttributeError, drf.Request.__getattr__ runs to get the attribute from the underlying django.Request object (per Python's attribute access handling).
  4. If this fails with an AttributeError, drf.Request.__getattribute__ is called again, even though it already failed when first trying to access the attribute.

(edited to clarify since there are technically two Request classes in play)

sevdog commented 3 months ago

Found the bug https://bugs.python.org/issue45985

The reason is that .data is a property, thus as of now it is catching the AttributeError and thus calling __getattr__.

james-mchugh commented 3 months ago

Found the bug https://bugs.python.org/issue45985

The reason is that .data is a property, thus as of now it is catching the AttributeError and thus calling __getattr__.

I do not believe that bug is related. The issue is that a Python property is being used to lazily parse the data, but that parsing can be pretty advanced and AttributeErrors raised from the parsing are not properly handled.

sevdog commented 3 months ago

Just try it!

class TestBrokenPerser(SimpleTestCase):
    def setUp(self):
        self.factory = APIRequestFactory()

    def test_post_accessed_in_post_method(self):
        django_request = self.factory.post('/', {'foo': 'bar'}, content_type='application/json')
        request = Request(django_request, parsers=[BrokenParser()])
        with self.assertRaises(AttributeError, msg='no in parse'):
            request._parse()

        with self.assertRaises(AttributeError, msg='no in data'):
            request.data

FAILED tests/test_parsers.py::TestBrokenPerser::test_post_accessed_in_post_method - AssertionError: AttributeError not raised : no in data

If you follow the call-stack:

The AttributeError is raised and reaches a property and then __getattr__ is called, just as explained in that bug.

james-mchugh commented 3 months ago

To be fair, it is somewhat related, but it's not clear to me if the Python team even sees that as a bug (opened over 2 years ago with no feedback). It seems to me to be more of just how Python handles attribute accesses. Properties are attributes too, so it makes sense that the __getattr__ will be called the same way other attributes are. I would expect that a decision to change this isn't simply a bug, but a design decision that would have to be introduced through a PEP.

The core issue here also isn't a confusing error message, it's that DRF's handling of the error completely suppresses the errors (no error messages at all). I would gladly accept a confusing error message due to the weird interplay between propertys and __getattr__ over that.

sevdog commented 3 months ago

The bug with the parser may be resolved just by wrapping any exception raised by parser in a ParseError:

--- a/rest_framework/request.py
+++ b/rest_framework/request.py
@@ -356,7 +356,7 @@ class Request:

         try:
             parsed = parser.parse(stream, media_type, self.parser_context)
-        except Exception:
+        except Exception as exc:
             # If we get an exception during parsing, fill in empty data and
             # re-raise.  Ensures we don't simply repeat the error when
             # attempting to render the browsable renderer response, or when
@@ -364,7 +364,9 @@ class Request:
             self._data = QueryDict('', encoding=self._request._encoding)
             self._files = MultiValueDict()
             self._full_data = self._data
-            raise
+            if not isinstance(exc, ParseError):
+                raise ParseError(str(exc))
+            raise exc

         # Parser classes may return the raw data, or a
         # DataAndFiles object.  Unpack the result as required.

This will not address the bug of Python itself (I also put here the reference to the github issue https://github.com/python/cpython/issues/90143 for better linking).

The problem may also be fixed by using __getattribute__ but that implementation was abandoned in #5617.

james-mchugh commented 3 months ago

The bug with the parser may be resolved just by wrapping any exception raised by parser in a ParseError:

I mostly agree here. I think the bug can be fixed easily by adding error handling to the property in the data property to re-raise AttributeError as another type of error (the issue you link suggests RuntimeErrors).

@property
def data(self):
        if not _hasattr(self, '_full_data'):
            try:
                self._load_data_and_files()
            except AttributeError as exc:
                raise RuntimeError(exc) from exc
        return self._full_data

It could be handled in _parse as well.

This will not address the bug of Python itself (I also put here the reference to the github issue https://github.com/python/cpython/issues/90143 for better linking).

I don't necessarily agree that this a bug with Python, as there is no confirmation from the Python team that it is a bug. It hasn't been triaged on for over 2 years. Just because someone opens an issue with a project doesn't mean that there is a confirmed bug.

The problem may also be fixed by using getattribute but that implementation was abandoned in https://github.com/encode/django-rest-framework/pull/5617.

I think you may be misunderstanding me here. I'm not suggesting that the implementation moves back to using __getattribute__ instead of __getattr__. I'm pointing out that there appears to be a bug with how __getattr__ is implemented because it calls __getattribute__ at https://github.com/encode/django-rest-framework/blob/master/rest_framework/request.py#L423. On the surface, this doesn't make sense given how Python's attribute access works, but maybe there is some unclear inner-workings of DRF that requires this type of handling. At best, it's intentional but very unclear why it's happening. At worst, it's a bug introduced by a misunderstanding with how attribute access works.

If __getattr__ did not call __getattribute__, an error would have been raised when the parser failed with an AttributeError, albeit it wouldn't have been immediately clear what the real problem is.

james-mchugh commented 3 months ago

I've been digging into this some more as I have some time over the weekend.

First, I'm convinced that https://github.com/python/cpython/issues/90143 is not a bug, as it is documented right here in the Python Docs that an AttributeError raised in a property should be handled by __getattr__.

Called when the default attribute access fails with an AttributeError (either getattribute() raises an AttributeError because name is not an instance attribute or an attribute in the class tree for self; or get() of a name property raises AttributeError).

Second, I experimented with swapping the re-calling of __getattribute__ at https://github.com/encode/django-rest-framework/blob/master/rest_framework/request.py#L423 out with explicitly re-raising an AttributeError, and that passed all of the unit tests. This would make it so AttributeErrors raised in the Parser are longer suppressed, but the error message would not be very clear.

Looking closer at the code in request.py, I noticed the wrap_attributeerror context manager which is used to provide the exact behavior needed (wrap AttributeErrors raised by properties). It's used in several properties, but not all of them. I added a safe_property decorator that utilizes this context manager and updated all the properties in Request to use the safe_property decorator instead. This should ensure that all properties in the Request class have more informative error handling in the event they raise an AttributeError. I've opened the PR #9455 with these updates.