Closed defnull closed 2 months ago
forms already contains the decoded values, e.g. (unicode) strings. This is a major difference ... at least right now. I suggest to keep this difference (MultipartPart instance for files and unicode strings for other form fields) and the separation between the two types of return values by the files and forms dicts.
What I do not like at the current implementation is that large forms do raise an exception and there is no way to say "I know this is gonna be big, give me a file object instead".
Think of a forum where an author tries to paste an entire novel into a form field or a scientific application where a biologist pastes a giant genome file into a
My idea is that MultipartPart() doubles as a string (str), unicode (unicode) and file-like object. The first two have a size limit and raise exceptions, but the user can fallback on a sequential .read() if he still wants the data in that form.
Suppose there is a huge field and some other (possibly later) small fields. The memory will be exceeded while reading the huge field. There is no easy way to recover from this situation other than the user telling in advance, that a certain text field should not be load in memory, i.e. becoming part of the forms multidict, but being returned as part of the files multidict. (We should avoid any magic for dropping certain fields from in-memory-handling automatically.) By that the programmer can also prepare his code to properly handle such a field differently. Hence we might add a "not-to-be-loaded" list of field names to parse_form_data
.
My idea is that MultipartPart() doubles as a string (str), unicode (unicode) and file-like object.
Why do you differentiate between str and unicode? Multipart is already able to handle the bare data and to return the decoded value. There is a value method with a size limit to fetch the decoded value, which is a unicode string on Python 2.x and a string on Python 3.x. Why should we add a method for loading the encoded value available by the read method anyway? We could also add a possibility to read the decoded value in chunks, but I rarely see a need for that. It can be implemented "cross-plattform" by codecs.lookup(
I'm a stale bot. Beep boop. I'm closing this now. Beep boop.
There is no real (technical or logical) difference between files and forms but a 'filename' attribute. Both can be larger than mem_limit and contain binary data. The parser should return MultipartPart() instances in any case, even if the data was url-encoded, so the user knows what he gets and what to check for.