defnull / multipart

A fast multipart/form-data parser for python
https://multipart.readthedocs.io/
MIT License
135 stars 33 forks source link

`part_limit=128` is not very generous #62

Closed agroszer closed 5 days ago

agroszer commented 1 month ago

Hi,

part_limit=128 is not very generous, you can easily get to more data/widgets to submit. Sadly zope.publisher does strict=False so parsing errors silently succeed. See https://github.com/zopefoundation/zope.publisher/issues/80

I'd propose to increase that number, and eventually refactor the limits to module global constants, like this:

BUFFER_SIZE=1024 * 64,
HEADER_LIMIt=8,
HEADERSIZE_LIMIT=1024 * 4 + 128,  # 4KB
PART_LIMIT=128,
PARTSIZE_LIMIT=2**64,  # practically unlimited
SPOOL_LIMIT=1024 * 64,  # Keep fields up to 64KB in memory
MEMORY_LIMIT=SPOOL_LIMIT * PART_LIMIT
DISK_LIMIT=2**64,  # practically unlimited

...

class MultipartParser(object):
    def __init__(
        self,
        stream,
        boundary,
        content_length=-1,
        charset="utf8",
        strict=False,
        buffer_size=BUFFER_SIZE,
        header_limit=HEADER_LIMIT,
        headersize_limit=HEADERSIZE_LIMIT
        part_limit=PART_LIMIT,
        partsize_limit=PARTSIZE_LIMIT,
        spool_limit=SPOOL_LIMIT,
        memory_limit=MEMORY_LIMIT,
        disk_limit=DISK_LIMIT,
        mem_limit=0,
        memfile_limit=0,

That would make life much easier by being able to override defaults by patching.

defnull commented 1 month ago

I suspected that the lower default limit may bite someone, but since most of the other limits are per-segment, the number of segments must be limited to protect against malicious requests. 128 segments á 96KB is 12MB already, that's a lot per request. My reasoning was that 128 is very generous for 'normal' forms, and for use cases where you have even more fields, you may want to tweak the limits anyway. So, just increasing the defaults may not be the best idea.

The proposed solution with the module level constants also won't work, as python evaluates default values for functions only once. Moving the default-fallbacks into the function body so they are evaluated at runtime (e.g. foo(arg=None): if arg is None: arg = DEFAULT) obscures the method signature and makes it less self-explanatory. Non-constant module globals are also kind of an anti-pattern. It is quite useful to be able to change the limits depending on the endpoint. File uploads need large size limits. Huge auto-generated forms with lots of small fields need lower size limits but a higher part count limit. Mutable global defaults would be a step backwards.

But there definitely is room for improvement: It is not well documented that parse_form_data accepts all limits understood by MultipartParser. That could be better.

And, more importantly, there is a bug or design error that I already noticed, but did not fix yet because it breaks backwards compatibility: Fatal errors that result in wrong or incomplete data always trigger an exception, even in non-strict mode. Reaching limits is a fatal error. The application should be notified that the form could not be parsed completely and some data may be missing. That's the idea, but parse_form_data suppresses all errors in non-strict mode. My reasoning back then was probably to make this helper function as painless as possible, but there should at least be an ignore_errors toggle to control this behavior. And that toggle should be off by default in future releases. strict=False should try to support broken clients, but it should not suppress fatal errors.

Tasks:

defnull commented 1 month ago

The error handling improvements are now tracked in https://github.com/defnull/multipart/pull/63

defnull commented 5 days ago

Documentation is now also greatly improved. I hope this is in a good state now to support developers in deciding for reasonable limits for their use case.