TechEmpower / FrameworkBenchmarks

Source for the TechEmpower Framework Benchmarks project
https://www.techempower.com/benchmarks/
Other
7.55k stars 1.94k forks source link

[Python] mrhttp: a project with many problems #9055

Open remittor opened 3 months ago

remittor commented 3 months ago

Fields in HTTP responses

Date field in responses is missing or does not meet requirements.

https://github.com/MarkReedZ/mrhttp/blob/fb0870254016dae2d224915f634448c395914e0d/src/mrhttp/internals/protocol.c#L660-L702

HTTP header parser

HTTP parser is sure that untruncated data came from the TCP stream

https://github.com/MarkReedZ/mrhttp/blob/fb0870254016dae2d224915f634448c395914e0d/src/mrhttp/internals/mrhttpparser.c#L170-L188

Function get_len_to_space searches the buffer for the nearest space and returns the length. Yes, but this space may be missing, since it may be in the next TCP package. But in this situation, function parse_request returns error -1, which leads to termination of request processing.

https://github.com/MarkReedZ/mrhttp/blob/fb0870254016dae2d224915f634448c395914e0d/src/mrhttp/internals/mrhttpparser.c#L100-L155

Function parse_headers_avx2 also hopes that the buffer contains absolutely all headers and their values.

HTTP header parser does not comply with the standard

https://github.com/MarkReedZ/mrhttp/blob/fb0870254016dae2d224915f634448c395914e0d/src/mrhttp/internals/parser.c#L144C16-L144C29

The standard allows more than 1 space between a : and a value. Therefore, if the request contains a line like this: Connection: close\r\n the condition will not work.

The same goes for this place: https://github.com/MarkReedZ/mrhttp/blob/fb0870254016dae2d224915f634448c395914e0d/src/mrhttp/internals/parser.c#L133-L136 The standard allows you to send the following requests: Content-Length: 0\r\n

This parser will not pass quality unit tests!

errantmind commented 3 months ago

No comment on standards compliance (haha) but the primary issue I can see is mrhttp is using a cached implementation which not allowed by the tfb rules for plaintext.

uNetworkingAB commented 1 month ago

Forget about mrhttp. Look at gnet. Its entire HTTP parser literally is:

requests[] = split(data, '\r\n\r\n')

and was the official 1st place in Round 22. And it has been tagged as "implementation approach: realistic".

uNetworkingAB commented 1 month ago

I don't think mrhttp does anything spectacularly wrong. It clearly does parse the headers and it does apply some kind of higher level handling above TCP. It uses AVX2 intrinsics to get a bitmap of the ; or \r which is an okay way. It's most likely not standards compliant or secure in any way, but it does look like an HTTP server to me. I'll give it a pass.

remittor commented 1 month ago

@uNetworkingAB

It's most likely not standards compliant or secure in any way

So this is not just a matter of compliance with the HTTP standard! I have already described above that the basic principle of working with the TCP data stream has not been observed! If the HTTP-request is split into 2 parts (which TCP allows), then server mrhttp will not be able to understand that it was sent 1 HTTP-request.

but it does look like an HTTP server to me. I'll give it a pass.

First, mrhttp must become a simple TCP-server, and only then - an HTTP-server.