defnull / multipart

A fast multipart/form-data parser for python
MIT License
126 stars 33 forks source link

Gracefully handle CONTENT_LENGTH == "" #35

Closed agroszer closed 1 month ago

agroszer commented 3 years ago

fixes #34

cjwatson commented 3 years ago

The test failures here will be fixed by #33, I think.

agroszer commented 3 years ago

looks like nginx is adding that, because https://www.python.org/dev/peps/pep-0333/#environ-variables says The contents of any Content-Length fields in the HTTP request. May be empty or absent.

zope.publisher used until

https://github.com/zopefoundation/zope.publisher/commit/382157293c72349f3c8742291f6aceba084bfabd#diff-524caa733a305f3082fecb38afb53289e232cf956d36fafe845832db6ac43b4c

stdlib cgi FieldStorage, which was also ignoring this:

        clen = -1
        if 'content-length' in self.headers:
            try:
                clen = int(self.headers['content-length'])
            except ValueError:
                pass

we just recently switched from python 3.7 to 3.9 and bumped zope.publisher to the latest

also fixing the zope part:

https://github.com/zopefoundation/zope.publisher/pull/64

cjwatson commented 2 years ago

@agroszer Could you rebase your branch on master so that we get passing tests before I merge this? Thanks!

agroszer commented 2 years ago

@cjwatson done

defnull commented 2 years ago

Hm. Not wanting to trigger 500 is understandable, but I struggle to decide which behavior is correct in case of a bad Content-Type value. HTTP spec does not allow it, so a 400 error would be correct. nginx behaves that way, too. The WSGI spec is very unhelpful (as usual) by allowing clearly invalid values for that header. The wording sounds like empty and absent values should be handled equally. So, the patch would be correct by defaulting to -1 as if the header was missing.

Can this be dangerous? The presence of the header indicates that we have a HTTP/1.1 connection that is able to send more than one request over a single connection. So, in theory, a carefully crafted HTTP 1.1 request could be both, one request or multiple requests, depending on the way a bad content-length is interpreted. Not sure which interpretation is best here, but following HTTP spec is probably less risky than following the weaker WSGI spec. A WSGI servers that turns invalid HTTP requests into valid WSGI requests would be incorrect, IMHO.

So, I lean towards throwing MultipartError in strict mode and assuming -1 otherwise. This is what strict-mode is made for, in the end.

defnull commented 1 month ago

I finally had some time to sink into this project again and revisited this issue. I would now say an invalid content-length header is a hard (fatal) error that should always trigger a MultipartError, even in non-strict mode. There is no value in supporting broken HTTP clients or WSGI servers, and guessing can lead to subtle errors and even security issues down the road, so better be strict than sorry.

The next release will raise MultipartError instead of ValueError and also warn a bit more clearly about the possibility that exceptions may happen, even in non-strict mode.