copy / v86

x86 PC emulator and x86-to-wasm JIT, running in the browser
https://copy.sh/v86/
BSD 2-Clause "Simplified" License
19.87k stars 1.41k forks source link

fetch-networking: fix parsing of HTTP headers #1182

Closed SuperMaxusa closed 1 week ago

SuperMaxusa commented 2 weeks ago

Currently, I have only seen this bug in older versions of Dillo, but it's possible that other HTTP clients have it too.

Test-case: boot from https://distro.ibiblio.org/damnsmall/current/current.iso with enabled fetch networking and then try to open any page in Dillo, you get in the browser console:

TypeError: Headers.set: accept-encoding:gzip is an invalid header name.

dillo-bug

A raw HTTP request from Dillo looks like this:

GET / HTTP/1.0
Host: copy.sh
User-Agent: Dillo/0.8.5-i18n-misc
Accept-Language: en-us, ja
Accept-Encoding:gzip
Accept-Charset:utf-8,ISO8859-1
Cookie2: $Version="1"

According to RFC 2616 (section 4.2), HTTP client/server can use any amount of whitespaces after separator, previous code expected only one space after the colon (separator).

SuperMaxusa commented 2 weeks ago

Thanks for the suggestions, I think that it's better make a properly HTTP headers validation, so my apologies that I had remake this PR. Let me know if I need to create a new PR instead.

Changes:

chschnell commented 2 weeks ago

Another concern I see with this HTTP request parser: Only the first fragment of the request body is taken into account, that is as many bytes that fit into the last TCP headers packet right after the end of headers, up until the end of packet. If the originating client sends a request with a longer body (one that stretches across multiple TCP packets) this will break.

After (or while) reading the headers it should be checked whether there is a Content-Length field, and if there is then as many bytes should be read from the originating client before continuing to process the request. For the sake of completeness (the originating client should be free to do whatever is allowed), also chunked body encoding should be supported when reading this reuest body.

EDIT: I'm not saying anyone should add that now, I just noticed this the other day and wanted to politely bring it up. I'd also have a few ideas regarding this.

SuperMaxusa commented 2 weeks ago

After (or while) reading the headers it should be checked whether there is a Content-Length field, and if there is then as many bytes should be read from the originating client before continuing to process the request.

Agreed, it might be worth adding a check if Content-Length: 0 but the client sends data. If I correctly understand, HTTP/1.1 client also can send chunked data: https://everything.curl.dev/http/post/chunked.html, and I'm not sure that even without Transfer-Encoding: chunked can send any amount of data before the connection is closed, as in this case with the server response: https://github.com/copy/v86/pull/1178#issuecomment-2470997329

chschnell commented 2 weeks ago

Let me rephrase what I mean, forgive me if this is already clear to you.

Think of a longer HTTP request from the originating client (by that I mean something like HTGET.EXE from mTCP). I mean "long" in relation to the TCP frame size.

For simplicitly let's say we have a TCP max. payload size of 1500 bytes, and a HTTP request of 500 bytes header data (including separator) and a body of another 2000 bytes, so a request of 2500 bytes total. The client's TCP stack splits this HTTP request into two TCP frames:

0      500    1000   1500   2000   2500
:      :      :      :      :      :
+------+---------------------------+
|Header|           Body            |
+------+---------------------------+
+--------------------+-------------+
|    TCP frame 1     | TCP frame 2 |
+--------------------+-------------+

on_data_http() gets called for each TCP frame received from the originating client. Its condition that checks if we have received all headers becomes true with the first TCP frame. When on_data_http() gets called again later with the second TCP frame, what part of the code handles that? Note that the fetch() transaction has already started at this point.

SuperMaxusa commented 1 week ago

Thank you for your clarification, probably checking Content-Length is worth it, I will investigate that in my free time.

copy commented 1 week ago

@SuperMaxusa It's good to merge from my point of view, any objections?

SuperMaxusa commented 1 week ago

It's good to merge from my point of view, any objections?

Ok, I have no objections. @chschnell Do you have any suggestions? How about an idea to create an issue from your comment about HTTP request on multiple TCP frames?

copy commented 1 week ago

Merci!

chschnell commented 1 week ago

It's good to merge from my point of view, any objections?

Ok, I have no objections. @chschnell Do you have any suggestions? How about an idea to create an issue from your comment about HTTP request on multiple TCP frames?

It's fine, it can wait.