This repository contains scenarios and parser tests for different Python based
multipart/form-data
parsers, comparing both blocking and non-blocking APIs (if
available).
cgi.FieldStorage
replacement.multipart/form-data
.Not included: Some parsers cheat by loading the entire request body into memory (e.g. sanic or litestar). Those are obviously very fast in benchmarks but also very unpractical when dealing with large file uploads.
python-multipart
v0.0.11 fixed a bug that caused extreme
slowdowns (as low as 0.75MB/s) in all three worst-case scenarios.email
parser that caused it to
skip over the actual parsing and also not do any IO in the blocking test.
Throughput was way higher than expected. This is fixed now.werkzeug
and also a sensible
default.multipart
with slightly better results in some tests.streaming-form-data
parser.django
parser.All tests were performed on a pretty old "AMD Ryzen 5 3600" running Linux 6.8.0 and Python 3.12.3 with highest possible priority and pinned to a single core.
For each test, the parser is created with default¹ settings and the results are
thrown away. Some parsers buffer to disk, but TEMP
points to a ram-disk to
reduce disk IO from the equation. Each test is repeated until there is no
improvement for at least 100 runs in a row, then the best run is used to compute
the theoretical maximum throughput per core.
¹) There is one exception: The limit for in-memory buffered files is set to
500KB (hard-coded in werkzeug
) to ensure a fair comparison.
Parser throughput is measured in MB/s (input size / time). Higher throughput is better.
A simple form with just two small text fields.
Parser | Blocking (MB/s) | Non-Blocking (MB/s) |
---|---|---|
multipart | 13.79 MB/s (100%) | 19.72 MB/s (100%) |
werkzeug | 5.69 MB/s (41%) | 7.16 MB/s (36%) |
django | 2.99 MB/s (22%) | - |
python-multipart | 3.63 MB/s (26%) | 6.09 MB/s (31%) |
streaming-form-data | 0.83 MB/s (6%) | 0.86 MB/s (4%) |
cgi | 4.78 MB/s (35%) | - |
3.95 MB/s (29%) | 4.33 MB/s (22%) |
This scenario is so small that it shows initialization overhead more than actual
parsing performance, which hurts streaming-form-data
the most. Small forms like
these should better be transmitted as application/x-www-form-urlencoded
, which
has a lot less overhead compared to multipart/form-data
and should be a lot
faster.
A large form with 100 small text fields.
Parser | Blocking (MB/s) | Non-Blocking (MB/s) |
---|---|---|
multipart | 23.85 MB/s (100%) | 30.56 MB/s (100%) |
werkzeug | 10.24 MB/s (43%) | 12.77 MB/s (42%) |
django | 5.48 MB/s (23%) | - |
python-multipart | 5.04 MB/s (21%) | 8.92 MB/s (29%) |
streaming-form-data | 1.13 MB/s (5%) | 1.17 MB/s (4%) |
cgi | 6.34 MB/s (27%) | - |
11.15 MB/s (47%) | 12.96 MB/s (42%) |
Large forms show a higher throughput because initialization overhead is no longer
the main factor. Parsing many small fields is still a lot of work for parsers,
and there are significant differences between implementations. email
is
designed for this type of line based text input, but multipart
is still twice
as fast.
A file upload with a single large (32MB) file.
Parser | Blocking (MB/s) | Non-Blocking (MB/s) |
---|---|---|
multipart | 1445.68 MB/s (100%) | 6048.42 MB/s (100%) |
werkzeug | 904.58 MB/s (63%) | 2658.58 MB/s (44%) |
django | 954.25 MB/s (66%) | - |
python-multipart | 1270.15 MB/s (88%) | 4602.17 MB/s (76%) |
streaming-form-data | 1084.81 MB/s (75%) | 4927.08 MB/s (81%) |
cgi | 127.75 MB/s (9%) | - |
51.96 MB/s (4%) | 59.73 MB/s (1%) |
Now it gets interesting. When dealing with actual file uploads, both
python-multipart
and streaming-form-data
catch up and are now faster than
werkzeug
or django
. All four are slower than multipart
, but the results
are still impressive. The line-based cgi
and email
parsers on the other hand
struggle a lot, probably because there are some line-breaks in the test file
input. This flaw shows even more in some of the tests below.
A form with two text fields and two small file uploads (1MB and 2MB).
Parser | Blocking (MB/s) | Non-Blocking (MB/s) |
---|---|---|
multipart | 1393.63 MB/s (100%) | 6932.10 MB/s (100%) |
werkzeug | 922.54 MB/s (66%) | 2680.75 MB/s (39%) |
django | 934.86 MB/s (67%) | - |
python-multipart | 1109.66 MB/s (80%) | 4647.66 MB/s (67%) |
streaming-form-data | 930.12 MB/s (67%) | 2592.41 MB/s (37%) |
cgi | 127.69 MB/s (9%) | - |
64.72 MB/s (5%) | 68.16 MB/s (1%) |
This is the most realistic test and shows very similar results to the upload
test above. In this scenario, multipart
and python-multipart
outperform the
others. werkzeug
, django
and streaming-form-data
are a bit slower, but
still way faster than the line-based cgi
and email
parsers.
A 1MB upload that contains nothing but windows line-breaks.
Parser | Blocking (MB/s) | Non-Blocking (MB/s) |
---|---|---|
multipart | 1392.83 MB/s (100%) | 6765.45 MB/s (100%) |
werkzeug | 1028.09 MB/s (74%) | 3992.00 MB/s (59%) |
django | 992.70 MB/s (71%) | - |
python-multipart | 700.70 MB/s (50%) | 1340.66 MB/s (20%) |
streaming-form-data | 50.42 MB/s (4%) | 52.34 MB/s (1%) |
cgi | 3.81 MB/s (0%) | - |
4.23 MB/s (0%) | 4.27 MB/s (0%) |
This is the first scenario that should not happen under normal circumstances
but is still an important factor if you want to prevent malicious uploads from
slowing down your web service. multipart
, werkzeug
and django
are
mostly unaffected and produce consistent results. python-multipart
slows down,
but still performs well. streaming-form-data
seems to struggle, but not as
much as the line-based parsers. Those choke on the high number of line-endings
and are practically unusable.
A 1MB upload that contains nothing but linux line-breaks.
Parser | Blocking (MB/s) | Non-Blocking (MB/s) |
---|---|---|
multipart | 1395.76 MB/s (100%) | 6782.10 MB/s (100%) |
werkzeug | 1001.55 MB/s (72%) | 3697.27 MB/s (55%) |
django | 953.28 MB/s (68%) | - |
python-multipart | 1152.93 MB/s (83%) | 4707.50 MB/s (69%) |
streaming-form-data | 897.26 MB/s (64%) | 2420.21 MB/s (36%) |
cgi | 1.71 MB/s (0%) | - |
2.60 MB/s (0%) | 2.60 MB/s (0%) |
Linux line breaks are not valid in segment headers or boundaries, which benefits
parsers that do not try to parse invalid input. streaming-form-data
is less
affected this time and performs well. The two line-based parsers on the other
hand are even worse than before. Throughput is roughly halved, probably because
there are twice as many line-breaks (and thus lines) in this scenario.
A 1MB upload that contains parts of the boundary.
Parser | Blocking (MB/s) | Non-Blocking (MB/s) |
---|---|---|
multipart | 1333.45 MB/s (97%) | 5853.86 MB/s (100%) |
werkzeug | 996.17 MB/s (72%) | 3533.90 MB/s (60%) |
django | 998.89 MB/s (72%) | - |
python-multipart | 1132.91 MB/s (82%) | 4246.49 MB/s (73%) |
streaming-form-data | 902.56 MB/s (65%) | 2404.69 MB/s (41%) |
cgi | 1379.35 MB/s (100%) | - |
146.24 MB/s (11%) | 163.16 MB/s (3%) |
This test was originally added to show a second issue with the python-multipart
parser, but that's fixed now. There is another interesting anomaly, though: Since
the file does not contain any newlines, cgi
is suddenly competitive again. Its
internal file.readline(1<<16)
call can read large chunks very quickly and the
slow parser logic is triggered less often.
All modern parsers (multipart
, werkzeug
, python-multipart
,
streaming-form-data
) are fast and behave correctly. All four offer
non-blocking APIs for asnycio/ASGI environments with very little overhead and a
high level of control. There are differences in API design, code quality,
maturity and documentation, but that's not the focus of this benchmark. The
django
parser is also pretty solid, but hard to use outside of Django
applications.
For me, streaming-form-data
was a bit of a surprise. It's really fast for
large file uploads, but not as fast as you might expect from a parser that is
partly written in Cython. It also shows significant overhead per segment, which
hurts performance when parsing small (e.g. text) fields. It's a mixed bag, but
still an interesting approach.
I probably do not need to talk much about email
or cgi
. Both show mixed
performance and are vulnerable to malicious inputs. cgi
is deprecated (for
good reasons) and email
is not designed for form data or large uploads at all.
Both are unsuitable or even dangerous to use in modern web applications.