Closed SuperMaxusa closed 1 week ago
Thanks for the suggestions, I think that it's better make a properly HTTP headers validation, so my apologies that I had remake this PR. Let me know if I need to create a new PR instead.
Changes:
Another concern I see with this HTTP request parser: Only the first fragment of the request body is taken into account, that is as many bytes that fit into the last TCP headers packet right after the end of headers, up until the end of packet. If the originating client sends a request with a longer body (one that stretches across multiple TCP packets) this will break.
After (or while) reading the headers it should be checked whether there is a Content-Length field, and if there is then as many bytes should be read from the originating client before continuing to process the request. For the sake of completeness (the originating client should be free to do whatever is allowed), also chunked body encoding should be supported when reading this reuest body.
EDIT: I'm not saying anyone should add that now, I just noticed this the other day and wanted to politely bring it up. I'd also have a few ideas regarding this.
After (or while) reading the headers it should be checked whether there is a Content-Length field, and if there is then as many bytes should be read from the originating client before continuing to process the request.
Agreed, it might be worth adding a check if Content-Length: 0
but the client sends data.
If I correctly understand, HTTP/1.1 client also can send chunked data: https://everything.curl.dev/http/post/chunked.html, and I'm not sure that even without Transfer-Encoding: chunked
can send any amount of data before the connection is closed, as in this case with the server response: https://github.com/copy/v86/pull/1178#issuecomment-2470997329
Let me rephrase what I mean, forgive me if this is already clear to you.
Think of a longer HTTP request from the originating client (by that I mean something like HTGET.EXE from mTCP). I mean "long" in relation to the TCP frame size.
For simplicitly let's say we have a TCP max. payload size of 1500 bytes, and a HTTP request of 500 bytes header data (including separator) and a body of another 2000 bytes, so a request of 2500 bytes total. The client's TCP stack splits this HTTP request into two TCP frames:
0 500 1000 1500 2000 2500
: : : : : :
+------+---------------------------+
|Header| Body |
+------+---------------------------+
+--------------------+-------------+
| TCP frame 1 | TCP frame 2 |
+--------------------+-------------+
on_data_http()
gets called for each TCP frame received from the originating client. Its condition that checks if we have received all headers becomes true with the first TCP frame. When on_data_http()
gets called again later with the second TCP frame, what part of the code handles that? Note that the fetch() transaction has already started at this point.
Thank you for your clarification, probably checking Content-Length is worth it, I will investigate that in my free time.
@SuperMaxusa It's good to merge from my point of view, any objections?
It's good to merge from my point of view, any objections?
Ok, I have no objections. @chschnell Do you have any suggestions? How about an idea to create an issue from your comment about HTTP request on multiple TCP frames?
Merci!
It's good to merge from my point of view, any objections?
Ok, I have no objections. @chschnell Do you have any suggestions? How about an idea to create an issue from your comment about HTTP request on multiple TCP frames?
It's fine, it can wait.
Currently, I have only seen this bug in older versions of Dillo, but it's possible that other HTTP clients have it too.
Test-case: boot from https://distro.ibiblio.org/damnsmall/current/current.iso with enabled fetch networking and then try to open any page in Dillo, you get in the browser console:
A raw HTTP request from Dillo looks like this:
According to RFC 2616 (section 4.2), HTTP client/server can use any amount of whitespaces after separator, previous code expected only one space after the colon (separator).