Pylons / webob

WSGI request and response objects
https://webob.org/
434 stars 187 forks source link

Support RFC 2231 #165

Open siebz0r opened 10 years ago

siebz0r commented 10 years ago

When uploading files with non-ascii characters in the filename (using the requests library), the filename is encoded using RFC 2231. Example:

--shrubbery
Content-Disposition: form-data; name=test; filename*utf8''=a%5C.txt

ni
--shrubbery--

Presuming this is quite a common approach, I think it would be a nice addition to WebOb. Currently, when trying to access the variables from the request, WebOb removes the filename from the body. Example:

import webob
import textwrap

post = textwrap.dedent("""
--spam
Content-Disposition: form-data; name="test"; filename*utf-8''="a%5Cb"

test
--spam--
""")

req = webob.Request.blank(
    '/', POST=post,
    content_type='multipart/form-data; boundary=spam')

original = str(req)
req.POST  # This seems to modify the request.
self.assertEqual(original, str(req))

The filename being removed is not all that obvious. If unsupported parameters are removed, I think emitting a warning would be nice so developers receive a hint on why this is happening. If it isn't already, it should be documented.

myroslav commented 9 years ago

The syntax for parameters in your example is wrong one, the proper one is:

filename*=utf-8''a%5Cb

There are other relevant standards, for instance the recent one proposed for the Content-Disposition header: RFC6266, implemented by https://pypi.python.org/pypi/rfc6266:

>>> rfc6266.parse_headers('''form-data; name="test"; filename*=utf-8''a%5Cb''')
ContentDisposition(u'form-data', {u'name': u'test', u'filename*': LangTagged(string=u'a\\b', langtag=None)}, None)
olemoign commented 8 years ago

Are there any news on this ? I'm kinda thinking about doing a patch for this if nobody is working on it.

digitalresistor commented 8 years ago
(webob) alexandra:webob xistence$ python
Python 3.5.0 (default, Oct  3 2015, 21:47:52) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.72)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import webob
>>> import textwrap
>>> 
>>> post = textwrap.dedent("""
... --spam
... Content-Disposition: form-data; name="test"; filename*utf-8''="a%5Cb"
... 
... test
... --spam--
... """)
>>> 
>>> req = webob.Request.blank(
...     '/', POST=post,
...     content_type='multipart/form-data; boundary=spam')
>>> 
>>> original = str(req)
>>> req.POST
MultiDict([('test', 'test')])
>>> original == str(req)
True
>>> 

We no longer modify the original body when accessing req.POST.

This change was made here: https://github.com/Pylons/webob/commit/1cc3340fabb638407cbdc8d7b7b1c09a7eca8148#diff-706d71e82f473a3b61d95c2c0d833b60

https://github.com/Pylons/webob/pull/192 which fixed https://github.com/Pylons/webob/issues/149


That being said, internally for forms/form-data WebOb uses Python's built-in cgi.FieldStorage, I am loath to fork and maintain something that comes with the standard library. Please fix the cgi.FieldStorage in Python instead.

digitalresistor commented 8 years ago

Here's a bug report on the Python bug tracker for FieldStorage:

https://bugs.python.org/issue23434