copy / v86

x86 PC emulator and x86-to-wasm JIT, running in the browser
https://copy.sh/v86/
BSD 2-Clause "Simplified" License
19.87k stars 1.41k forks source link

improve fake network write() efficiency #1178

Closed chschnell closed 7 hours ago

chschnell commented 3 weeks ago

Several improvements in fake network code that deal with data flowing towards the guest OS (i.e. package encoding and downstream buffering) and TCP state mechanics.

chschnell commented 2 weeks ago

I have a small idea for fetch-based networking that I would first like to check with you before committing.

Currently, the full HTTP response (header and body) from fetch() is cached before being forwarded to the originating HTTP client.

My proposal is to only wait for the HTTP response headers from fetch(), and then to forward the response body asynchronously chunk by chunk to the originating HTTP client using ReadableStream and HTTP Chunked transfer encoding.

To test this, I changed on_data_http() in fetch_network.js based on a ReadableStream usage pattern from MDN, it's pretty straight-forward and it works well with my tests (all HTTP applications support chunked Transfer-Encoding to my knowledge). This would also remove the need for FetchNetworkAdapter.fetch(), I merged its relevant code into on_data_http().

The upside is that this much, much increases both the responsiveness as well as the effective speed of fetch-based networking, all the more with increasing size of the HTTP response.

The downside is that the originating HTTP client never gets to see the Content-Length of the response it is receiving, thus it cannot show any meaningful progress statistics anymore other than about what it has received so far. Any Content-Length must be ignored if any Transfer-Encoding is used, and the Content-Length from the fetch response header can be incorrect (for example, for compressed response bodies). The only way around this seems to be the prefetching like it currently is. I might be wrong, but to my knowledge it's categorically either prefetching or the loss of Content-Length (for any request from the originating HTTP client). I wasn't aware of that until I had finished implementing it.

I would of course wait with this until #1182 is merged, and include any changes from there.

I'm really undecided which side outweighs the other here. What do you think, thumbs up or down?

copy commented 2 weeks ago

I'm really undecided which side outweighs the other here. What do you think, thumbs up or down?

Generally thumbs up, but I'm unsure about the downsides. I wonder if HTTP Chunked transfer encoding is necessary at all. Can't the data be streamed in separate ethernet packets?

and the Content-Length from the fetch response header can be incorrect (for example, for compressed response bodies).

I believe for compressed bodies (via Content-Encoding), Content-Length indicates the length of the compressed body. As far as I know, compression via Transfer-Encoding isn't really used or widely implemented.

chschnell commented 2 weeks ago

EDIT: The gist of the matter is that I cannot determine whether Content-Length from the fetch() response headers is correct, so I cannot use it at all and I have no Content-Length to pass to the originating HTTP client. The only option I see at this point is to use chunked encoding.


If I could rely on Content-Length in the received fetch() response headers then I could just pass the unmodified chunks on to the originating client, passing the very same Content-Length to it.

I build and send the response headers to the originating client when I receive the first response body chunk, but I don't have a Content-Length to set in these response headers (well I do, but it is wrong when compression is used).

In fetch(), my browser negotiates gzip compression with my Apache server and transparently decompresses the response before passing me the decompressed chunks (and passes me the compressed size in Content-Length, which is useless). I never get to know how long the response is going to be myself.

I tried to suppress compression but Accept-Encoding is amongst the forbidden header names in fetch, so I can't set that header in the fetch request.

I wonder, is there any other method besides Chunked transfer encoding that I could use when I don't know the length of the response?

copy commented 2 weeks ago

In fetch(), my browser negotiates gzip compression with my Apache server and transparently decompresses the response before passing me the decompressed chunks (and passes me the compressed size in Content-Length, which is useless). I never get to know how long the response is going to be myself.

I see the problem now. fetch passes you already decompressed chunks, but Content-Length indicates the compressed size on the wire.

I wonder, is there any other method besides Chunked transfer encoding that I could use when I don't know the length of the response?

It seems to be legal to send an http reponse with neither Content-Length nor Transfer-Encoding: chunked, in which case the server closes the TCP connection after writing all data.

chschnell commented 2 weeks ago

I wonder, is there any other method besides Chunked transfer encoding that I could use when I don't know the length of the response?

It seems to be legal to send an http reponse with neither Content-Length nor Transfer-Encoding: chunked, in which case the server closes the TCP connection after writing all data.

I actually do all this to learn and also try out new things, so thank you very much for this information!

It should fit in well since I'm already closing the HTTP connection to the originating client once the transaction is complete (current fetch-based networking can't do that because of the FIN packet that is sent too early).

chschnell commented 2 weeks ago

It works without both Content-Length and chunked encoding! :)

Of course, the originating client still doesn't get to learn the Content-Length of the response body, so it cannot show any meaningfull download progress information besides the total bytes received so far.

copy commented 1 week ago

Looks good to merge to me (one more small comment above). @chschnell Any objections? (asking because it's still marked as a draft)

chschnell commented 1 week ago

Looks good to merge to me (one more small comment above). @chschnell Any objections? (asking because it's still marked as a draft)

I'm still not sure whether you want the fetch()-improvement or not, I can check in the code any time, it's not that much code and also well-tested.

The question is whether you prefer to have the performance boost or to keep the Content-Length intact.

I tend to the performance boost, but I'm in no position to make that decision.

SuperMaxusa commented 1 week ago

The question is whether you prefer to have the performance boost or to keep the Content-Length intact.

How about using the default way for small responses or when the custom x-fetch... header is set (like a fallback)?

copy commented 1 week ago

The question is whether you prefer to have the performance boost or to keep the Content-Length intact.

Yeah, let's do proper chunking and give up the Content-Length. (perhaps leave the old path in the code, so it could be toggled if there are any problems).

chschnell commented 1 week ago

The question is whether you prefer to have the performance boost or to keep the Content-Length intact.

Yeah, let's do proper chunking and give up the Content-Length. (perhaps leave the old path in the code, so it could be toggled if there are any problems).

Ok, I hope this was the last commit, from my side this PR is ready to go.

I've commented out two blocks of code in fetch_network.js (for fallback), and put a single new one in. Some kind of software switch could be added to toggle between the two implementations, but I tried to keep it simple.

@SuperMaxusa It should be easy to extend the code as it is now to a two-way solution, for example to prefetch the body when it is not too large, or when the originating client's HTTP request contained a specific header. If anyone wants it I could add it in a later PR.

SuperMaxusa commented 1 week ago

It should be easy to extend the code as it is now to a two-way solution, for example to prefetch the body when it is not too large, or when the originating client's HTTP request contained a specific header.

Yes, since most HTTP clients can handle these responses without Content-Length (e.g. on KolibriOS, DSL with Dillo and Firefox, Windows 95 with Internet Explorer 4.0, this works properly), let's leave your implementation with ReadableStream.

By the way, in many CI runs, the mocked.example.org test failed (as an exception, this behaviour is allowed), while the test with real example.org is more reliable, and after the recent changes, this test doesn't work. Can we remove this test?

chschnell commented 1 week ago

e.g. on KolibriOS, DSL with Dillo and Firefox, Windows 95 with Internet Explorer 4.0, this works properly

@SuperMaxusa Thanks a lot for testing, that is really helpful of you!

By the way, this transfer model (transfer without Content-Length and to use end-of-connection to indicate end-of-response-body) is how HTTP 1.0 used to work, and support for it is still required in HTTP 1.1 (here the relevant section). I guess for these two reasons it is widely support.

chschnell commented 6 days ago

It should be easy to extend the code as it is now to a two-way solution, for example to prefetch the body when it is not too large, or when the originating client's HTTP request contained a specific header.

By the way, in many CI runs, the mocked.example.org test failed (as an exception, this behaviour is allowed), while the test with real example.org is more reliable, and after the recent changes, this test doesn't work. Can we remove this test?

Starting test #6: Curl mocked.example.org
    Captured: wget -T 10 -O - mocked.example.org
    Captured: echo -e done\\tmocked.example.org
    Captured: ~% wget -T 10 -O - mocked.example.org
    Captured: Connecting to mocked.example.org (192.168.87.1:80)
Fetch Failed: http://mocked.example.org/
TypeError: fetch failed
    Captured: wget: server returned error: HTTP/1.1 502 Fetch Error
    Captured: ~% echo -e done\\tmocked.example.org
    Captured: done  mocked.example.org
[...]

Unless there's more to it, Test #6 in tests/devices/fetch_network.js fails correctly, it attempts to fetch() from http://mocked.example.org/ which is an invalid DNS name. The next test nr. 7 attempts to fetch() from http://example.org/ wich is a valid URL, and this test succeeds.

I am not sure, but in case you mean test nr. 6, maybe it is intended to just test that a failure in fetch() is handled correctly? Was this test ever meant to succeed?

SuperMaxusa commented 6 days ago

it attempts to fetch() from http://mocked.example.org/ which is an invalid DNS name

This test has a custom fetch for which http://mocked.example.org is an existed website, see also https://github.com/copy/v86/pull/1061#discussion_r1676848522 and https://github.com/copy/v86/pull/1061/commits/6518770365d65ef8fe9ea9d06630c61539858ca2.

When the FetchNetworkAdapter.prototype.fetch has been commented out and replaced by fetch with ReadableStream, it's not called and the test fails.

this transfer model (transfer without Content-Length and to use end-of-connection to indicate end-of-response-body) is how HTTP 1.0 used to work, and support for it is still required in HTTP 1.1

Interesting, I would never have thought this works as HTTP 1.0 compatibility, thanks for that information!

chschnell commented 6 days ago

it attempts to fetch() from http://mocked.example.org/ which is an invalid DNS name

This test has a custom fetch for which http://mocked.example.org is an existed website, see also #1061 (comment) and 6518770.

When the FetchNetworkAdapter.prototype.fetch has been commented out and replaced by fetch with ReadableStream, it's not called and the test fails.

Ah I see, thanks for the explanation.

It's true that this test cannot work anymore because I removed FetchNetworkAdapter.fetch(), I wasn't aware of this side-effect.

I would also argue (with my limited oversight) that this test is no longer needed with the changed class layout.

chschnell commented 6 days ago

On a second thought, FetchNetworkAdapter.fetch() could stay, it no longer gets called by on_data_http() but exclusively by that test function now.

Perhaps a bit of a stretch, but maybe someone needs the blocking fetch() in the future and then FetchNetworkAdapter.fetch() would be useful to have.

ProgrammerIn-wonderland commented 5 days ago

Related note I just noticed since this touches wisp_network. If a connection was closed in wisp and the v86 VM tried sending a TCP packet it crashes the whole thing. It may be worth just adding some extra checks around that code just to make sure everything's behaving fine

This PR might fix it since it touches closing logic, I'm just not sure

ProgrammerIn-wonderland commented 5 days ago

Wisp does support bidirectional stream closure (I think I just screwed up how I handled it in v86), is there some issue you encountered with it? Not sending the voluntary close may cause resource leaks for both the client and the server

chschnell commented 5 days ago

Related note I just noticed since this touches wisp_network. If a connection was closed in wisp and the v86 VM tried sending a TCP packet it crashes the whole thing. It may be worth just adding some extra checks around that code just to make sure everything's behaving fine

This PR might fix it since it touches closing logic, I'm just not sure

I did not come across this when writing the PR. I mostly changed fake_network.js, wisp_network.js only minimally where neccessary.

As a test just now, I killed my WISP server (epoxy) while V86 was running without using the network, and after a few seconds it throws an exception in the browser console. Then I restarted the WISP server, and the same V86 instance picked it up just fine, network was ok.

Then I killed the WISP server while having a telnet session open in V86, caused some traffic, restarted the WISP server, tried to exit telnet which took around half a minute (two exceptions or so), but then it came back and I was able to start a new telnet session without problems, stil in the same V86 instance.

It looks stable to me (V86 survived), but maybe I'm not testing right. How can I provoke this situation you mean?

chschnell commented 5 days ago

Wisp does support bidirectional stream closure (I think I just screwed up how I handled it in v86), is there some issue you encountered with it? Not sending the voluntary close may cause resource leaks for both the client and the server

I added a WISP frame of type CLOSE with reason 0x02 (voluntary stream closure) to the V86 WISP client just a couple of days ago and removed it again a couple of hours ago after I was told yesterday (in discussion #1175) that this wasn't supported.

I could add it back in easily, I left a comment in fake_network.js in my branch where this would belong (it's where we receive a TCP packet with FIN flag from V86 while in TCP state ESTABLISHED).

The reason I added it was to close only V86's side of the TCP connection (so-called "half-close"), for reasons see here:

The example from the first link is quite simple (see there for the full explanation):

ssh hostname sort < datafile

This needs TCP half-close in order to work. IIRC then FTP could also be hindered from working without support for half-close.

I'm a bit confused now. Does WISP support TCP half-close? What does CLOSE with reason 0x02 (from the client) exactly do, and is it generally supported by WISP servers?

ProgrammerIn-wonderland commented 5 days ago

I've never heard of half close, okay I see, wisp doesn't support that. Does full closing still work? I thought that commit removed full closing and that's why I was concerned

ProgrammerIn-wonderland commented 5 days ago

close 0x02 0x04 closes the TCP stream entirely, no more receiving or transmitting can be done. The server doesn't really care about the reason. It's there more so as courtesy

Edit: fix wrong packet type

chschnell commented 5 days ago

I've never heard of half close, okay I see, wisp doesn't support that. Does full closing still work? I thought that commit removed full closing and that's why I was concerned

close 0x02 closes the TCP stream entirely, no more receiving or transmitting can be done

Only WispServerCpp implements CLOSE/0x02 like that, neither epoxy-tls nor wisp-js do, in my observation. There was no CLOSE/0x02 in V86's WISP client before I added it a couple of days ago.

So in the case of WispServerCpp that commit did remove full closing, but in the case of the other two WISP servers it didn't change anything. I did that because none of them implemented half-close. Now it all comes down to a timeout by the server, or a leaked connection in case that never happens.

Maybe I should put the CLOSE/0x02 better back in, even though it will break certain applications and has no effect with some WISP servers.

ProgrammerIn-wonderland commented 5 days ago

Are you saying the other wisp servers don't support closing or don't support half closing? Sorry I think I'm misunderstanding

To my understanding every current wisp server does support full closing of streams from either direction (client and server can both report closed streams I believe)

Also close is 0x04, the reason is 0x02. Just wanted to clarify because I stated it incorrectly above in my own message

r58Playz commented 5 days ago

I've never heard of half close, okay I see, wisp doesn't support that. Does full closing still work? I thought that commit removed full closing and that's why I was concerned

close 0x02 closes the TCP stream entirely, no more receiving or transmitting can be done

Only WispServerCpp implements CLOSE/0x02 like that, neither epoxy-tls nor wisp-js do, in my observation. There was no CLOSE/0x02 in V86's WISP client before I added it a couple of days ago.

So in the case of WispServerCpp that commit did remove full closing, but in the case of the other two WISP servers it didn't change anything. I did that because none of them implemented half-close. Now it all comes down to a timeout by the server, or a leaked connection in case that never happens.

Maybe I should put the CLOSE/0x02 better back in, even though it will break certain applications and has no effect with some WISP servers.

@chschnell

The protocol does not specify any special behavior on the server side for specific close reasons. I don't know why WispServerCpp behaves differently here, but wisp-js and epoxy-server are following protocol and closing fully. Wisp V1 reference Wisp V2 reference

Any CLOSE packets sent from either the server or the client must immediately close the associated stream and TCP socket. The close reason in the payload doesn't affect this behavior, but may provide extra information which is useful for debugging.

On epoxy, you can use RUST_LOG=trace to see that it does close streams (stream uuid f9f5dcdb-5edc-40d2-9ba7-b7686b6cfc19 disconnected for client id "127.0.0.1:46658"):

[2024-11-21T20:32:37Z TRACE epoxy_server] routed "127.0.0.1:46658": Wisp
[2024-11-21T20:32:37Z DEBUG epoxy_server::handle::wisp] new wisp client id "127.0.0.1:46658" connected with extensions [1], downgraded false
[2024-11-21T20:32:37Z TRACE epoxy_server::handle::wisp] sent ping to wisp client id "127.0.0.1:46658"
[2024-11-21T20:32:37Z TRACE epoxy_server::handle::wisp] sent ping to wisp client id "127.0.0.1:46658"
[2024-11-21T20:32:37Z DEBUG epoxy_server::handle::wisp] new stream created for client id "127.0.0.1:46658": (stream uuid f9f5dcdb-5edc-40d2-9ba7-b7686b6cfc19) ConnectPacket { stream_type: Tcp, destination_port: 8000, destination_hostname: "localhost" } ConnectPacket { stream_type: Tcp, destination_port: 8000, destination_hostname: "127.0.0.1" }
[2024-11-21T20:32:37Z DEBUG epoxy_server::handle::wisp] stream uuid f9f5dcdb-5edc-40d2-9ba7-b7686b6cfc19 disconnected for client id "127.0.0.1:46658"
[2024-11-21T20:33:07Z TRACE epoxy_server::handle::wisp] sent ping to wisp client id "127.0.0.1:46658"
[2024-11-21T20:33:37Z TRACE epoxy_server::handle::wisp] sent ping to wisp client id "127.0.0.1:46658"
[2024-11-21T20:33:59Z DEBUG epoxy_server::handle::wisp] wisp client id "127.0.0.1:46658" multiplexor result Ok(())
[2024-11-21T20:33:59Z DEBUG epoxy_server::handle::wisp] shutting down wisp client id "127.0.0.1:46658"
[2024-11-21T20:33:59Z TRACE epoxy_server::handle::wisp] waiting for tasks to close for wisp client id "127.0.0.1:46658"
[2024-11-21T20:33:59Z DEBUG epoxy_server::handle::wisp] wisp client id "127.0.0.1:46658" disconnected
ProgrammerIn-wonderland commented 5 days ago

Also wisp server Cpp sends a 0x04 Close packet back after the client sends one. It is the only server to do this, and it actually isn't in the spec. The client is supposed to assume the stream is closed immediately after sending the close packet.

Just wanted to note that on this GitHub thread

r58Playz commented 5 days ago

Another note: WispServerCpp doesn't actually close any sockets that it forwards, violating spec, so the remote server never sees the close from the wisp client. Edit: This can also result in sending packets to sockets that were closed according to the client, emulating half-closed behavior without actually closing anything.

ProgrammerIn-wonderland commented 4 days ago

I think it's worth looking into making a protocol extension for half closing, @ading2210 @r58Playz

chschnell commented 4 days ago

@ProgrammerIn-wonderland

Are you saying the other wisp servers don't support closing or don't support half closing? Sorry I think I'm misunderstanding

Sorry about the confusion. I meant to say the other WISP servers don't support closing, but it turned out I was wrong about that.

Also close is 0x04, the reason is 0x02. Just wanted to clarify because I stated it incorrectly above in my own message

Understood. By 0x02 I always meant reason 0x02 (Voluntary stream closure).

@r58Playz

On epoxy, you can use RUST_LOG=trace to see that it does close streams (stream uuid f9f5dcdb-5edc-40d2-9ba7-b7686b6cfc19 disconnected for client id "127.0.0.1:46658")

Thanks, I enabled tracing and indeed, I see the same message. Sorry about my confusion, I was wrong when I said it wasn't supported by epoxy-tls/wisp-js, they indeed do.

chschnell commented 4 days ago

My latest commit is a slight change in strategy with this close issue.

I now distingush between on_shutdown() (FIN) and on_close() (RST), this is the right way to model it in fake_network.js.

I implemented both on_shutdown() and on_close() for WISP using a CLOSE frame with reason 0x02. This does not implement half-closed TCP connections (currently not supported), but this also doesn't leak idle TCP connections.

It would be trivial to change on_shutdown() in case WISP support changes in the future.

ProgrammerIn-wonderland commented 4 days ago

Thanks! This does make retrofitting it migh easier if the server advertises support for TCP half closing.

chschnell commented 2 days ago

Currently there are two different fake DNS methods in V86:

  1. in fake_network.js: offline, static implementation that returns 192.168.87.1 for any given DNS name
  2. in wisp_network.js: uses fetch() and DoH (DNS over HTTPS), DoH server defaults to cloudflare-dns.com unless specified in config.doh_server

Fetch-based networking uses the first method, wisp-based the second, and I'm not sure why.

Faking DNS with DoH works just fine for both fetch- and wisp-based networking, and it returns real results.

So I would like to move the DoH-code from wisp_network.js into handle_fake_dns() (fake_network.js) and make it the default fake DNS method.

Question: Is the first method without DoH still needed or can it be removed? I think it is not needed, only fetch-based networking uses it, and with fetch() this DNS method doesn't really matter (the browser performs the DNS lookup for the URL passed to fetch()).

In case I should keep the first method I would suggest to add a new option config.dns_method next to config.doh_server with possible values static and doh (default doh), unless someone has a better idea.

SuperMaxusa commented 2 days ago

Question: Is the first method without DoH still needed or can it be removed? I think it is not needed, only fetch-based networking uses it, and with fetch() this DNS method doesn't really matter (the browser performs the DNS lookup for the URL passed to fetch()).

For fetch-based networking, there is not much need for it: https://github.com/copy/v86/pull/1061#issuecomment-2184188633, and fetch with IP address works buggy, especially if HTTPS-only mode is enabled in browser.

ProgrammerIn-wonderland commented 2 days ago

Fetch networking uses the Host: header so the IP address given is irrelevant, to save time it just returns a dummy IP.

Since wisp actually connects to the IP it has to do the real resolution

ProgrammerIn-wonderland commented 2 days ago

Sorry just saw your command supermaxusa, my Internet is glitching

ProgrammerIn-wonderland commented 2 days ago

I don't think it makes much sense to add Doh to fetch personally since it's not really necessary based on how it operates (unless you're running a SUPER OLD pre Host header http client). But I do think the Doh server should be configurable

chschnell commented 1 day ago

@SuperMaxusa @ProgrammerIn-wonderland

Thanks a lot for your input!

We all agree that DoH doesn't make much sense with fetch-based networking, my point was that it also doesn't hurt because it matters so little. But it's fine, I'll leave it at that.

In my latest commit I moved the DoH-code from WispNetworkAdapter.send() into new function handle_fake_dns_doh() in fake_network.js, and only WispNetworkAdapter uses DoH, just as before. So it's only a refactoring, the code behaves unchanged. Everything is in place in case someone wants to add a config option for this, I don't see the point.

The benefit of all this is that all of the code in WispNetworkAdapter.send() that was duplicated from fake_network.js is now gone which somewhat simplifies reasoning about the fake networking control flow (it made me scratch my head quite a little).

copy commented 1 day ago

Thanks! This looks good to merge to me now. Let me know if there's another change you'd like to get in. I will try to rebase (to get rid of the merge commits).

chschnell commented 22 hours ago

Thanks! This looks good to merge to me now. Let me know if there's another change you'd like to get in. I will try to rebase (to get rid of the merge commits).

Yes, let's merge it.

I see two things left to do, but I want to split them off into a new PR in a couple of weeks.

  1. replace the bus with direct function calls
  2. further improve buffering, for example to encode ethernet packets at the latest possible stage, or to use GrowableRingbuffer in other places

I will do other things in the next week or two, but I think then I'll take another go at this, if you don't mind.

basicer commented 16 hours ago

Great work! A huge improvement over my hacked up proof of concept.

chschnell commented 12 hours ago

Great work! A huge improvement over my hacked up proof of concept.

My pleasure! And it was just a bit rough around the edges, the core was rock-solid.

copy commented 7 hours ago

Merged manually, thanks all!

chschnell commented 4 hours ago

@copy I am very sorry but I just noticed that I broke the cors_proxy function in FetchNetworkAdapter. Only a single line needs to be fixed in fetch_network.js, is a quick PR ok?

copy commented 3 hours ago

@chschnell Sure