defnull / multipart

A fast multipart/form-data parser for python
MIT License
128 stars 33 forks source link

Problems parsing simple W3C examples #7

Closed luchr closed 8 years ago

luchr commented 8 years ago

There are some open problems/questions I have concerning this module:

  1. Simple W3C-examples like https://www.w3.org/TR/html401/interact/forms.html

    Content-Type: multipart/form-data; boundary=AaB03x
    
    --AaB03x
    Content-Disposition: form-data; name="submit-name"
    
    Larry
    --AaB03x
    Content-Disposition: form-data; name="files"
    Content-Type: multipart/mixed; boundary=BbC04y
    
    --BbC04y
    Content-Disposition: file; filename="file1.txt"
    Content-Type: text/plain
    
    ... contents of file1.txt ...
    --BbC04y
    Content-Disposition: file; filename="file2.gif"
    Content-Type: image/gif
    Content-Transfer-Encoding: binary
    
    ...contents of file2.gif...
    --BbC04y--
    --AaB03x--

    are not parsed correctly. [Even FieldStorage can parse this.]

  2. Support for encoded field names (RFC 2388, 3. and 5.4) by "the standard method described in RFC 2047"?
  3. Support for embedded "multipart/mixed" (RFC 2388, 4.2)?

    • 100% test coverage? Of what? The W3C-examples? RFC 2388?
    • prevent DOS attacks? Do you count the bytes in the headers of the (sub-)parts?
    • Did you read RFC 2388? Ever thought why FieldStorage-objects can be nested? Ever thought about why the docu of FieldStorage says the name can be None (and this is intentional)? Having no "name": What does this tell you about the idea to save (sub-)parts in a dict?
    • Do you plan to note this "limitations" somewhere?
defnull commented 8 years ago

Correct me if I'm wrong, but as far as I know, 1) multipart/mixed is not used by modern browsers or HTTP client libraries. 2) browsers or HTTP client libraries send filenames as UTF-8 encoded unicode strings nowadays. 3) see 1.

Implementing support for unused or outdated parts of am RFC is not always the right thing to do. If you find a valid use-case, however, I'd be happy to accept your pull request.

As for your other questions:

luchr commented 8 years ago

I added a pull request, where I tried to describe the limitations mentioned above.

No I did not read the entire RFC word by word, nor do I think someone is required to do that in order to write a useful library

Thank you for this statement/clarification. Then, I have no further questions.