cetra3 / mpart-async

Asynchronous Multipart Requests for Rust
Apache License 2.0
32 stars 13 forks source link

Add bench, use BytesMut #11

Closed koivunej closed 4 years ago

koivunej commented 4 years ago

Just by replacing Vec with BytesMut yielded some good benchmark numbers but I am not sure if those are correct.

This shouldn't conflict with the #10.

koivunej commented 4 years ago

I was a bit surprised how little edits changing to BytesMut took but that probably was one of the design goals of it. I had previously forgotten to change that single std::mem::replace(&mut ..., Vec::new()) to std::mem::take(&mut ...).

Please be critical about the benchmark, I haven't written many of those. I was thinking that the javascript project formidable may have considered this more but they do a similar header + zeroes + trailer totaling to 10GB. I thought it was excessive so I tuned it down to 10MB with 512 byte chunks. I did not check what kind of chunk sizes warp likes to use, or if it has any preference, or where do it's chunk sizes even come from.

Looks like this leaves with 23 clippy warnings. Maybe those should be tackled next.

For perf related matters it would probably be best to create a optimized recognizer for at least the StreamingContent state, though I am unsure if the twoway::find_bytes(..., r"\r") can be beat. One way could be to inspect match only new bytes from the underlying stream with a longer string, then do some more careful analysis near the end of buffer to catch any partial matches, carry over the recognizer state near the boundary at the next buffer from the underlying stream and so on, maybe? Though, perf is probably good enough. Formidable reports a bit over 4GBps on my machine (and throws an exception) so if the 7GBps is correct its quite ok.

koivunej commented 4 years ago

Just realized that the zero byte case is quite generious. Changing the zero byte to \r yields a bit different numbers... 10MB case does not complete, 10kB case:

ten megabytes/512       time:   [2.7644 ms 2.7729 ms 2.7863 ms]
                        thrpt:  [3.5048 MiB/s 3.5218 MiB/s 3.5326 MiB/s]

(ignore the bench using the same name)

With BytesMut:

ten megabytes/512       time:   [699.49 us 701.54 us 703.69 us]
                        thrpt:  [13.878 MiB/s 13.920 MiB/s 13.961 MiB/s]
                 change:
                        time:   [-74.898% -74.766% -74.666%] (p = 0.00 < 0.05)
                        thrpt:  [+294.72% +296.29% +298.37%]
cetra3 commented 4 years ago

Hmm, the 10mb not completing may be something worthy of investigating. I have a feeling that having \r chars in the stream is tripping up the parser still. Otherwise this change looks good!