defnull / multipart

A fast multipart/form-data parser for python
https://multipart.readthedocs.io/
MIT License
135 stars 33 forks source link

Everything with content is treated as a file. #18

Closed theelous3 closed 2 years ago

theelous3 commented 5 years ago
from multipart import MultipartParser

m = '--8banana133744910kmmr13a56!102!2405\r\nContent-Disposition: form-data; name="blah";\r\n\r\nwhatever\r\n--8banana133744910kmmr13a56!102!2405--\r\n'

boundary = '8banana133744910kmmr13a56!102!2405'.encode()

parser = MultipartParser(BytesIO(m.encode()), boundary)

parser.parts()[0].file
<_io.BytesIO object at 0xb6d8880c>

There's nothing to indicate that the above is a file. It should be treated as regular form data. If I'm reading correctly, only the presence of the filename param indicates a file.

4.4. Content-Type Header Field for Each Part Each part MAY have an (optional) "Content-Type" header field, which defaults to "text/plain". If the contents of a file are to be sent, the file data SHOULD be labeled with an appropriate media type, if known, or "application/octet-stream".

It's not clear if the reverse of this is the case, where if no filename is given but the content type is app/octet it's inferred that the content is a file. I think it doesn't make sense to presume any and all app/octet is a file, so filename seems the only good indicator.

Side note: the default value for MultipartPart.file is False, but it becomes a BytesIO obj. It should instead be used as a flag and set True, and MultipartPart.value should be the only way to fetch the contents of any and all parts.

theelous3 commented 5 years ago

PR: https://github.com/defnull/multipart/pull/19

theelous3 commented 5 years ago

Actually I think the better solution here is to just ignore all of that entirely and not assign things of any type at all, removing attrs like file, and let the user do whatever they want with the given info.

defnull commented 2 years ago

The MultipartParser API treats all parts as file-like binary data because on protocol level there is no real difference between file uploads or form fields. Both are binary data with headers and both can be larger than what you would want to store in a string. The only real difference is that browsers add a filename field to the Content-Disposition header if the data came from a file.

So, I think this issue is based on a misunderstanding. MultipartPart.file is only false during the header parsing phase, and after that, always has a binary file-like value. It is not a flag indicating whether or not the part was a HTTP form file upload. The application should check for MultipartPart.filename to detect file uploads.

The high-level parse_form_data function does exactly that: It splits all parts between text fields and file uploads based on MultipartPart.filename (and MultipartPart.is_buffered() for text fields that are too large to fit into memory).