Create QUIC library that can be exposed to JS and uses the Node `dgram` module

MatrixAI / js-quic

QUIC Networking for TypeScript & JavaScript

https://matrixai.github.io/js-quic/

Apache License 2.0

13 stars 1 forks source link

Create QUIC library that can be exposed to JS and uses the Node `dgram` module #1

Closed CMCDragonkai closed 1 year ago

CMCDragonkai commented 1 year ago

Specification

We need QUIC in order to simplify our networking stack in PK.

QUIC is a superior UDP layer that can make use of any UDP socket, and create multiplexed reliable streams. It is also capable of hole punching either just by attempting to send ping frames on the stream, or through the unreliable datagrams.

Our goals is to make use of a QUIC library, something that is compilable to desktop and mobile operating systems, expose its functionality to JS, but have the JS runtime manage the actual sockets.

On NodeJS, it can already manage the underlying UDP sockets, and by relying on NodeJS, it will also ensure that these sockets will mix well with the concurrency/parallelism used by the rest of the NodeJS system due to libuv and thus avoid creating a second IO system running in parallel.

On Mobile runtimes, they may not have a dgram module readily available. In such cases, having an IO runtime to supply the UDP sockets may be required. But it is likely there are already existing libraries that provide this like https://github.com/tradle/react-native-udp.

The underlying QUIC library there is expected to be agnostic to the socket runtime. It will give you data that you need to the UDP socket, and it will take data that comes of the UDP socket.

However it does need 2 major duties:

The multiplexing and managing of streams.
The encryption/decryption TLS side

Again if we want to stay cross platform, we would not want to bind into Node.js's openssl crypto. It would require instead that the library can take a callback of crypto routines to use. However I've found that this is generally not the case with most existing QUIC libraries. But let's see how we go with this.

Additional context

https://github.com/MatrixAI/Polykey/issues/234 - This is the main issue talking about how to replace utp-native
https://github.com/napi-rs/napi-rs/tree/main/examples/napi - examples of how to use napi-rs
https://nodejs.org/api/dgram.html - dgram module in NodeJS where the QUIC data will be plumbed to
https://blogs.keysight.com/blogs/tech/nwvs.entry.html/2021/07/17/looking_into_quicpa-pUtF.html - some notes about how QUIC data can be viewed on wireshark
https://www.sitepoint.com/rust-global-variables/
https://deepu.tech/memory-management-in-rust/
https://github.com/cloudflare/quiche/issues/326#issuecomment-577281881 - Disabling TLS verification and doing it custom
Required options https://github.com/cloudflare/quiche/issues/1250
https://blog.cloudflare.com/head-start-with-quic/
https://github.com/romain-jacotin/quic/blob/master/doc/QUIC_crypto_protocol.md

QUIC and NAPI-RS

https://github.com/napi-rs/napi-rs/discussions/1373 - callback usage
https://github.com/napi-rs/napi-rs/issues/1374 - Callback usage in factory methods
https://github.com/cloudflare/quiche/issues/326#issuecomment-577281881 - custom TLS verification in quiche
https://github.com/napi-rs/napi-rs/discussions/1409 - how to deal with external object types
https://github.com/napi-rs/napi-rs/discussions/1379 - napi-rs can't export submodules in rust for now

Sub issues

Tasks

[x] Experiment with Neon or Napi-rs
[x] Experiment with quiche by cloudflare
[x] Create bridge code plumbing UDP sockets and QUIC functions
[x] Create self signed TLS certificate during development - https://github.com/MatrixAI/js-quic/issues/1#issuecomment-1356229150
~Extract out the TLS configuration so that it can be set via in-memory PEM variable and key variable. - 2 day~ - see #2
- This should be doable with https://docs.rs/quiche/latest/quiche/struct.Config.html#method.with_boring_ssl_ctx
- This needs to construct a SSL ctx with boring ssl, and pass in PEM string and key string (or Uint8Array) to be processed by the boringssl library.
~Decide whether application protocols are necessary here, or abstract the quiche Config so the user can decide this (especially since this is not a HTTP3 library). - 0.5 day~ - see #13
[x] Fix the timeout events and ensure that when a timeout occurs, that the connection gets cleaned up, and we are not inadvertently clearing the timeout due to null. Right now when a quiche client connects to the server, even after closing, the server side is keeping the connection alive. - 1 day
[x] We need lifecycle events for QUICConnection and QUICStream and QUICServer and QUICSocket. This will allow users to hook into the destruction of the object, and perhaps remove their event listeners. These events must be post-facto events. - 0.5 day
~[ ] Test the QUICStream and change to BYOB style, so that way there can be a byte buffer for it. Testing code should be able generator functions similar to our RPC handlers. - 1 day~ - see #5.
[x] Complete the QUICClient with the shared socket QUICSocket. - 3 day
~Test the multiplexing/demultipexing of the UDP socket with multiple QUICClient and a single QUICServer. - 1 day~ - See #14
~[ ] Test the error handling of the QUIC stream, and life cycle and destruction routines. - 1 day~ - #10
~Benchmark this system, by sending lots of data through. - 1 day~ - See #15
~Propagate the rinfo from the UDP datagram into the conn.recv() so that the streams (either during construction or otherwise) can have its rinfo updated. Perhaps we can just "set" the rinfo properties of the connection every time we do a conn.recv(). Or... we just mutate the conn parameters every time we receive a UDP packet.~ - See #16
~Ensure that when a user asks stream.connection they can acquire the remote information and also all the remote peer's certificate chain.~ - See #16
~[ ] Integrate dgram sending and recving for hole punching logic.~ - see #4
~[ ] Integrate the napi program into the scripts/prebuild.js so we can actually build the package in a similar to other native packages we are using like js-db~ - see #7

CMCDragonkai commented 1 year ago

On the readable side of QUIC stream, we are dealing with a push-flow source.

In our start method, we register a handler for the readable event. We will use the readable event every time we know that a QUIC stream is readable.

Upon being readable we go through this logic:

Is it paused? If so, do nothing and return here.
Allocate a buffer of 1024 and read from the stream.
If we get an exception of Done, then there's nothing to do, we just return. (Here I have reversed the change of Done to 0-length reads/writes because it does seem to indicate something different from having an empty buffer).
If we get a different exception, we send this to the controller with controller.error(e); and we return here`.
Now we enqueue the buffer with the received length.
If the fin is true, then we do controller.close() and return.
If the desired size is not null, and it is <= 0 then we pause.

For the pull() method, we will:

Unpause the recv plug
Dispatch a new readable event, which will trigger another read of the handleReadable.

For the cancel() method, we will:

Remove the event listener handleReadable.
Run this.conn.streamShutdown to shutdown the read side with the reason mapped to a code.

Some possible improvements:

Using byob seems it might be faster.
How exactly the fin and Done works will need to be experimented with.
How exactly the desiredSize is null needs to be considered. It seems that users can set the stream's desiredSize to be null to avoid any blocking at all, and this would just cause the controller the buffer up all the data... basically removing any kind of backpressure and having an unbounded queue.

CMCDragonkai commented 1 year ago

It turns out that plugging the receive was a simple matter of switching a boolean. No promises or events required.

On the other hand for the send side where we wait for promise to resolve based on a writable event, an alternative is to use a plug, and expect the emitter of writable to explicit unplug however this is not symmetric.

By symmetry I mean that the QUICStream right now receives 2 internal events: readable and writable. Where both events are ultimately determined by whether the QUIC stream from quiche is in fact readable or writable, and these events being derived from every UDP socket message and timeout event.

CMCDragonkai commented 1 year ago

If we are going to expose the Done exception, we will need to do that for the other functions too, not just stream_recv and stream_send.

CMCDragonkai commented 1 year ago

Just a note, when we have the QUICStream later. We can do things like: await quicStream.writable.getWriter().ready;.

This would essentially be waiting for the stream to be ready to write. This is how the backpressure works at the writer to web stream stage.

CMCDragonkai commented 1 year ago

Since we are using EventTarget right now, we need to consider error handling. The EventTarget does not have any special handling for exceptions or errors. That means we should not use async functions as event handlers for EventTarget. Any rejections would be considered an uncaughtException and will go to process.on('uncaughtException', () => { }).

This means any errors on socket sending or other callbacks must be done with callbacks that then emit the event to the error handler.

A default error handler should be available... that will then throw the event detail as an uncaught exception if no custom error handler is made available.

CMCDragonkai commented 1 year ago

So the command:

napi build --platform --js ./src/native/index.js --dts ./src/native/index.d.ts

Will compile a single binary index.linux-x64-gnu.node while putting the index.js and index.d.ts into the same directory.

As long as this is the case, TSC will realise that index.d.ts is the types for index.js.

Now I can choose to name the index.linux-x64-gnu.node differently, this is controlled by the napi config inside package.json.

The generated JS file does do the equivalent of node-gyp-build in js-db. It figures out which platform we are on, and then loads the appropriate dependency.

Interestingly it does in fact support:

Different platform triples that are in the local directory
The importation of "optional dependencies" such as @matrixai/quic-android-arm64.

I'm wondering how to control these names. It seems to automatically derive as suffixes of the main package name being @matrixai/quic.

So we would end up with something like:

@matrixai/quic
@matrixai/quic-android-arm64
@matrixai/quic-android-arm-eabi
@matrixai/quic-win32-x64-msvc
@matrixai/quic-win32-ia32-msvc
@matrixai/quic-win32-arm64-msvc
@matrixai/quic-darwin-x64
@matrixai/quic-darwin-arm64
@matrixai/quic-freebsd-x64
@matrixai/quic-linux-x64-musl
@matrixai/quic-linux-x64-gnu
@matrixai/quic-linux-arm64-musl
@matrixai/quic-linux-arm64-gnu
@matrixai/quic-linux-arm-gnueabihf

One problem here is that for situation 1., it currently expects the binary to exist in the same directory as index.js.

Now we don't have to always use the .js file, we could change it ourselves accordingly. So we could hardcode a fix if autogeneration doesn't work well for us.

CMCDragonkai commented 1 year ago

If we go with the optional package route, we will need to have a multi-package repo, since each separate platform will need its own directory with a package.json.

Also it seems once we create optional dependencies, we can apply constraints to each optional dependency using the os and cpu keys:

This should be able to add constraints to which ones you are installing for.

However there's another issue. Suppose you creating a cross-platform native package that depends on another cross-platform native package. In that scenario, if you are only allowed to install linux because you're on linux, it can make it more difficult to distribute your own set of binaries, unless you are also changing the entire OS via a CI/CD. For example things like android above, who would be building using an android OS.

Atm, it seems like all optional packages would be installed without the constraints.

CMCDragonkai commented 1 year ago

I've moved all the rust code into src/native/napi. The Cargo.toml has been updated accordingly.

We are probably going to create a subdirectory called packages and actually make use of optionalDependencies in this main package.

The packages directory then requires each of the built binaries to be put into the directory during the CI/CD build process and then published.

The os/arch constraint will then be applied.

CMCDragonkai commented 1 year ago

Ok one of things that is different between EventEmitter and EventTarget is the handling of errors. Here's a demo:

const et = new EventTarget();
et.dispatchEvent(new Event('error');

Nothing happens. No handler for error event, no special handling.

Now in EventEmitter which is how the TCP server behaves along with other node constructs:

import events from 'events';
const ee = new events.EventEmitter();
ee.emit('error');

Immediately we get:

Error [ERR_UNHANDLED_ERROR]: Unhandled error. (undefined)

This is the case with TCP servers:

import net from 'net';

async function main() {
  const server = net.createServer((conn) => {
    console.log(conn);
  });
  server.listen(55555, () => {
    server.emit('error');
  });
}

void main();

This results in the same error. Only when we add an event handler for error does it not end up breaking the entire node runtime.

So if we want to replicate the behaviour of TCP server and TCP connections, we would need to do something similar with EventTarget. To do so, we would need to add a "default" error handler to the error event, and then check that the error event hasn't previously been handled, so it's the handler of last resort.

Alternatively we can overload the addEventListener so that if the error handler is being assigned, we remove the default error handler.

Note that it's possible the error handler is removed, so as long as there's no handlers, the throwing the exception would need to be ensured.

There's another thing, it would be important to ensure that if errors are just ignored, that the server/connection can continue to work, especially given the node runtime will continue to run.

But one could also differentiate between recoverable errors, and unrecoverable errors.

CMCDragonkai commented 1 year ago

The main top level classes should avoid to use as much nodisms as possible. So I'll need that defaulting behaviour to the classes extending EventTarget.

There's some additional complexity. The role of the QUICServer and the QUICConnection. In TCP there's just the server and the socket. That's it. The socket itself can have errors too, but that may also result in errors that exist the node system. In QUIC we would have 3 concepts, server (wrapping UDP socket, and propagating udp errors), connection, stream.

I need to check what happens if the an error is emitted to the tcp connection (while the server is running), if the same thing happens. And I just realised that I need to propagate the UDP socket error to the QUICServer too.

CMCDragonkai commented 1 year ago

Actually I cannot do the above error handling with EventTarget very easily. This is because event target does not allow the same listener instance to be used multiple times unless the capture is toggled.

We don't know how many listeners there would be an error handler. We would end up having to keep track of these listener instances... Which just overcomplicates the situation.

Options are:

Don't replicate behaviour of event emitter, without an error handler, errors just go to nowhere.
Use EventEmitter, but then we have to figure out a solution in the future for non-node environments, there's some libraries like eventemitter3, but I suspect not very lightweight - consider that the UDP socket dgram module is already node-specific, I imagine that further abstraction would be necessary in the future for non-node environments anyway
Bring in a library like eventemitter3. However it turns out even this library doesn't actually have a default behaviour of throwing the exception just like event target.

I think the best solution right now is 1., to not replicate the behaviour, users can still decide what to do with their addEventListener('error').

CMCDragonkai commented 1 year ago

Regarding garbage collection.

For streams, stream close happens in 2 separate ways:

Read side
Write side

For the read side, a stream could be closed when we receive fin packet. This is indicated by the fin boolean through connection.streamRecv().

In this case, it would mean that we should close the stream. Because this means the QUIC stream is closed, and therefore our web stream is closed too.

Alternatively it is possible that, our web stream is cancelled, and thus stream.cancel() is called. In this case, we perform connection.streamShutdown on the read side.

I'm not entirely sure what would happen if there's an error during streamRecv, if that means we should attempting to shutdown the stream, or whether that would the case already. The examples don't show what happens.

Right now I just propagate the error using controller.error(e);.

However I imagine that if there is in fact a error here... we should attempt to shutdown the stream. I'm trying to see if this is a problem.

For the write side, we have both stream.close and stream.abort. The stream.close sends a fin packet, whereas stream.abort immediately shutsdown.

So because these streams are duplex (and we are only dealing with duplex streams atm in this library), then a QUICStream is only truly closed when both sides are closed. And right now half-open state is possible.

So upon any of the above closing of the readable and writable, we call a function right now called gcStream that checks that both read and send sides are closed, if so, it proceeds to delete the QUICStream from the parent streams map.

This means stream lifecycle can be determined by the user of the QUICStream, and by the QUICConnection. It's also possible that QUICConnection also proceeds to explicitly stop the readable and writable side with QUICConnection.stop. This proceeds to cancel on the readable, while doing writable.close because the writable side should be gracefully done.

This seems to make sense for QUICStream.

CMCDragonkai commented 1 year ago

Now for QUICConnection, we have a separate problem. A connection can also be closed by outside events, primarily due to recv, but also could happen during an error with send.

In the case of recv, an error is dispatched. But in the examples, they just continue the read loop. They don't do anything to the connection.

I believe this is because:

On success the number of bytes processed from the input buffer is returned. On error the connection will be closed by calling close() with the appropriate error code.

Which implies that the recv itself will end up calling the close() if there was an error. So we don't need to do this on our end.

Ok I can see it:

                    // In case of error processing the incoming packet, close
                    // the connection.
                    self.close(false, e.to_wire(), b"").ok();

So connection.recv does in fact automatically close the underlying connection.

On the send side, if the send fails, we have to explicitly call connection.close to indicate the fact that we failed to send a packet. We have to choose whether it is an application error or a library error, the error code and error message.

The problem is that connection.close() does not mean connection.isClosed() is true. It's a lazy operation, meaning there are some stuff that still needs to be done.

Note that the connection will not be closed immediately. An application should continue calling the recv(), send(), timeout() and on_timeout() methods as normal, until the is_closed() method returns true.

And because there's callbacks in quiche library. We have to actually poll the quiche library to know when a connection is actually closed. And only then can we proceed to remove the connection from the connections map in QUICServer.

CMCDragonkai commented 1 year ago

The other issue is that I'm not sure how closing a connection affects all of its existing underlying streams.

It would appear that closing a connection should also mean all its underlying streams are closed. But if that's the case, how does our QUICStream get knowledge about this?

It's possible we get a memory leak here, since if the streams are all closed, they could potentially not be readable or writable anymore, and in that case, we never get a quic stream error, and can therefore not propagate such errors to the web stream.

One alternative is to proceed to do an explicit stop on all our streams if there's a connection error. Actually we need to do it probably non-gracefully since if we cannot send data on the connection, it's rather useless to attempt to write an explicit stream close.

CMCDragonkai commented 1 year ago

So a couple problems here:

How to propagate the connection close to the QUICStream object so that it can be properly garbage collected. Beware of how the underlying stream works, and how QUICStream object itself would be aware or not aware of its underlying object state.
How to deal with the fact that QUICConnection itself cannot just be closed synchronously. We have to poll whether it is closed, before we can remove it from the connections map. The polling seems to occur on every event either when we received a UDP socket message, or when there's a timeout event.

This is relevant too: https://datatracker.ietf.org/doc/html/rfc9000#section-10.2

CMCDragonkai commented 1 year ago

For 2., I'm just doing connection iteration at the end of QUICServer handleMessage and handleTimeout.

It's time to do the next practical test and map out what happens with the new class structure. And then from this point onwards we need to iterate on the class structure as there's still unknown unknowns and known unknowns. Difficult translating a loopy example codebase in quiche to a nodejs evented codebase.

CMCDragonkai commented 1 year ago

Do note that the in order to pass in in-memory private key and certificates we will need to use boringssl's builder for the config struct. https://docs.rs/quiche/0.16.0/quiche/struct.Config.html#method.with_boring_ssl_ctx

It appears to support building up things like: https://docs.rs/boring/2.1.0/boring/pkey/struct.PKey.html#method.private_key_from_pem

And then you use functions like https://docs.rs/boring/2.1.0/boring/ssl/struct.SslContextBuilder.html#method.set_private_key which sets it up.

The builder is consumed and returns the ssl context: https://docs.rs/boring/2.1.0/boring/ssl/struct.SslContextBuilder.html#method.build.

After building this, more config settings can be set. This can done as a separate issue.

CMCDragonkai commented 1 year ago

Ok I'm able to use wireshark to inspect the protocol using the filter udp.port == 55555 && not icmp.

Here we only want to see the interaction between the client and server.

One issue is that we do need to decrypt the TLS, apparently I need to make use of https://docs.rs/quiche/0.16.0/quiche/struct.Config.html#method.log_keys. That can dump the keylog file that can be loaded by wireshark.

Actually you need more than this. You have to also use: https://docs.rs/quiche/0.16.0/quiche/struct.Connection.html#method.set_keylog

So right now wireshark will be of limited use... if we want to decrypt contents. I don't have that available in the native code yet. Can be done later.

The file is apparently meant to be supplied with SSLKEYLOGFILE environment variable. I think the node runtime can provide this. But it may just a be a file path, that the rust side has to turn into a writer object.

CMCDragonkai commented 1 year ago

In the initial packet the client sends to the server, it has both the DCID and SCID.

The DCID is what the client uses to identify the server. The SCID is what the client uses to identify itself.

Now why are we using dcid in our code to identify the connection? Because as chatgpt says:

The reason that the DCID is typically used to identify the connection in Quiche (and other Quic implementations) is because the DCID is included in every packet that is sent over the connection, while the SCID is only included in the initial packet sent by the client. This means that the DCID is more readily available for use as a connection identifier, as it is present in every packet.

And this is in fact true according to wireshark, there are QUIC protected paylaod packets that only contain the DCID and no longer the SCID. Because they use a short header.

CMCDragonkai commented 1 year ago

Tracing the QUIC implementation is tough without some sort of tracing system for the logs. And this is a pending thing to do later.

CMCDragonkai commented 1 year ago

I just realised a new logging standard that might be useful:

Receive X   <-- start
Receiving X <-- only use this if you intend 1 log message
Received X  <-- stop

We currently using ing when we start, but really we should the ing only when we intend to use 1 log message like an "event" compared to a trace which has a start event and stop event. I think in opentracing this is called a log? Not sure about the terminology there.

Furthermore suppose a function call happens before or after? Where should the ing be? It's a bit ambiguous since it can be either. It's really its own point in time wherever it is relevant. Whereas start and stop should be at the beginning and end of the relevant function call.

However I err on before, because then if the operation failed, then an exception occurs after the message.

CMCDragonkai commented 1 year ago

For 2., I'm just doing connection iteration at the end of QUICServer handleMessage and handleTimeout.

It's time to do the next practical test and map out what happens with the new class structure. And then from this point onwards we need to iterate on the class structure as there's still unknown unknowns and known unknowns. Difficult translating a loopy example codebase in quiche to a nodejs evented codebase.

With respect to the timeout. It turns out that for a new connection that was just accepted, calling timeout() returns null. This must imply that the timeout() is intended be "polled", that is we must call into the quiche library on different events to find out whether the connection has a timeout that we must setup. In the rust examples, these appear to occur on every new message sent. Because otherwise how would we ever know if a connection ever needs to set a timeout or not? We may be able to spread out checks for this depending on various state changes on the connection but this requires further experimentation.

CMCDragonkai commented 1 year ago

We might to start working on the QUICClient class to fully work out the protocol... the server is getting to the point where the streams are just waiting for work to be done. And the example quiche client is using HTTP3, and we also need to somehow deal with the connection streams too...

So when we create a quic server, we need to have an event handler for new connections, but for each connection a handler for new streams.

Having a handler for new streams is similar to like HTTP handlers for request/response transactions. Each new stream is like new request, only that we can both read and write. A stream in this case is like a TCP connection/socket. But have it all multiplexed as part of connections too.

The user of the quic server needs to handle connections, but do they also handle streams directly like addEventListener('stream')? Or would these events be part of the connection instead? I think it makes more sense to expect the user to add event listeners to the connection objects themselves so the server only works on connections.

CMCDragonkai commented 1 year ago

We need to start building out the QUICClient in order to prove that all these requirements can be met: https://github.com/MatrixAI/Polykey/issues/234#issuecomment-1123133027.

Specifically:

The ability to multiplex separate connections into 1 socket, and therefore that means 1 port. This is necessary for NAT busting. A connection is between node to node. This means the same port/socket is used for multiple connections to different nodes.

Right now quiche doesn't do anything with sockets, so we are just receiving messages on the UDP socket and then processing it using the header parse. But we need to see how this would be done on the client side.

This was possible via go: https://github.com/lucas-clemente/quic-go/issues/561 so we need to be able to replicate this here.

CMCDragonkai commented 1 year ago

When using the same socket, it is likely we would use the same dgram socket for multiple client connections to different servers and also for the server.

We need to also be able to identify whether the received packet is intended for which client connection.

CMCDragonkai commented 1 year ago

The client.rs in quiche apps is more comprehensive than the example client.rs it also demonstrates how connection migration would be done on the client side. Only clients can do connection migration right now.

CMCDragonkai commented 1 year ago

In terms of identifying packets, we can work with parsing their connection IDs, or their remote addresses. If we identified packets as coming from a certain address we could route to the right client. But we would have to first see if it is intended for the server, and not for the clients.

Also discussion on long and short headers:

Packets with long headers include Source Connection ID and Destination Connection ID fields. These fields are used to set the connection IDs for new connections; see Section 7.2 for details.

Packets with short headers (Section 17.3) only include the Destination Connection ID and omit the explicit length. The length of the Destination Connection ID field is expected to be known to endpoints. Endpoints using a load balancer that routes based on connection ID could agree with the load balancer on a fixed length for connection IDs or agree on an encoding scheme. A fixed portion could encode an explicit length, which allows the entire connection ID to vary in length and still be used by the load balancer.

But the problem even with addresses, we would need to know if these addresses are legitimate, which is why the server does the stateless retry.

This version of QUIC uses the long packet header during connection establishment; see Section 17.2. Packets with the long header are Initial (Section 17.2.2), 0-RTT (Section 17.2.3), Handshake (Section 17.2.4), and Retry (Section 17.2.5). Version negotiation uses a version-independent packet with a long header; see Section 17.2.1.

Packets with the short header are designed for minimal overhead and are used after a connection is established and 1-RTT keys are available; see Section 17.3.

CMCDragonkai commented 1 year ago

In reviewing Polykey's usage of UTP, it does seem that UTP was made for this usecase. Where we have the ability to create a UTP object when doing UTP.createServer(), and then subsequently it was possible to create UTP connections on the same UTP object. Thus being able to act like a server, and the ability to create multiple clients.

It seems that quiche should be capable of doing this. There's no restriction on sharing the same UDP socket after doing bind, we can have multiple QUICClient and a QUICServer all using the same UDP socket.

The only issue is that all the examples show that the clients and server all depend on handleMessage but we just need a way of parsing these UDP messages and identifying what kind of packet it is... and then directing it. Although I'm also confused that it's possible for there to be coalesced QUIC packets in 1 UDP datagram. If so, then how does the Header.fromSlice work? Maybe it only reads 1 QUIC packet out of the UDP datagram, meaning its possible for there to be more than one?

I think the Header.fromSlice is ONLY used to parse the INITIAL QUIC packet. This means there could in fact be multiple QUIC packets coalesced in a single handleMessage. However Header.fromSlice will only parse the first one. The subsequent packets don't need to be parsed, as they will just be handled by the connection.recv. The primary use of Header.fromSlice is to acquire the connection IDs so we can route them appropriately. Remember that the server already is muxing multiple connections from different clients. Therefore if there is a UDP socket used for both clients and server, then at the end of the day we just have an even larger map of connections all identified by connection IDs.

CMCDragonkai commented 1 year ago

The client's INITIAL packet sent to the server contains a randomly set SCID and DCID.

When an Initial packet is sent by a client that has not previously received an Initial or Retry packet from the server, the client populates the Destination Connection ID field with an unpredictable value. This Destination Connection ID MUST be at least 8 bytes in length. Until a packet is received from the server, the client MUST use the same Destination Connection ID value on all packets in this connection.

The Destination Connection ID field from the first Initial packet sent by a client is used to determine packet protection keys for Initial packets. These keys change after receiving a Retry packet; see Section 5.2 of [QUIC-TLS].

The client populates the Source Connection ID field with a value of its choosing and sets the Source Connection ID Length field to indicate the length.

Ok so I think the idea here is that if we are to combine clients and server, we then have to have a combined handleMessage demuxer. It must then analyse the packet header to get the connection ID. If the UDP datagram has multiple coalesced QUIC packets, then it is assumed that all these QUIC packets would be intended for the same connection. If any of the packets deviates from the first packet's connection IDs, then that will result in a processing error later by quiche.

Then inside this demuxer it has to refer to a ConnectionMap. This map is basically something that will need to be shared across QUICClient and QUICServer because it has to be used to identify QUIC packets intended for client-side connections or server-side connections.

Further prototyping required on this front.

CMCDragonkai commented 1 year ago

In the client examples, it binds to a random IPv4 address and port if the server/target address is IPv4, and a random IPv6 address and port if the server/target address is IPv6. This actually means we do need to know what the target address via a DNS resolution first.

I'm not entirely sure if NodeJS supports dual stack properly where ipv4 mapped ipv6 addresses are possible: https://stackoverflow.com/questions/61741547/udp-client-server-mix-ipv4-and-ipv6

See https://blog.apify.com/ipv4-mapped-ipv6-in-nodejs/

Suppose we use the rinfo, then if we were bound to ::, would we then get ipv4 packets as ::ffff:127.0.0.1?

And also this only works if we bind to :: and not any other IPv6 addresses. This will require further testing later.

CMCDragonkai commented 1 year ago

I got a confirmation that using connection IDs is how we would demux packets for shared UDP socket.

So I'm creating a QUICSocket to encapsulate the management of a shared socket object, since creating an appropriate UDP socket is a bit complex.

CMCDragonkai commented 1 year ago

Now that we know that some packet headers are short-form and thus do not have an SCID, therefore when we do quiche.header.fromSlice, what happens to the scid property? Well it still exists, but the Rust code uses ConnectionId::default() which will produce an empty byte array.

CMCDragonkai commented 1 year ago

Updated handshake diagram with DCID and SCID.

The SCID is the source connection ID, and it is meant to be chosen by the peer to represent the ID for itself.

The DCID gets "agreed" upon using the below protocol.

Retry and version negotiation packets and packets with short header cannot be coalesced with other packets in the same datagram. This means they are always by themselves.

Notice that in the retry packet, a new SCID S2 is used. This is derived from the S1 via HMAC signing. The HMAC signing is not part of the QUIC specification, however it is security feature to ensure the integrity and prevent replay attacks. It is just chosen by the quiche implementation to use in its examples.

The client has to change its DCID in response to this retry packet. But it is possible for the server to change its SCID again in its next initial packet. In which case the client has to change the DCID again. There's no specific reason why this might occur.

Note that at this point in time, the QUIC spec only supports client-side connection migration. It's not expected that the server would have its address change. Although this would be the case in the future. Connection migration just means that the client can change its address (possibly because it changed networks on a mobile network).

In PK's P2P case, any change in address would affect both client side and server side. So it would require performing a new handshake and reconnecting all connections.

┌────────┐                                ┌────────┐
│ Client │                                │ Server │
└───┬────┘                                └────┬───┘
    │          ┌───────────────────┐           │
  1.├─────────►│Initial            ├──────────►│
    │          │version: 3132799674│           │
    │          │token: []          │           │
    │          │dcid: S1           │           │
    │          │scid: C1           │           │
    │          └───────────────────┘           │
    │                                          │
    │          ┌───────────────────┐           │
  2.│◄─────────┤Version Negotiate  │◄──────────┤
    │          │version: 1         │           │
    │          │dcid: C1           │           │
    │          │scid: S1           │           │
    │          └───────────────────┘           │
    │                                          │
    │          ┌───────────────────┐           │
    │          │Initial            │           │
    │          │version: 1         │           │
  3.├─────────►│token: []          ├──────────►│
    │          │dcid: S1           │           │
    │          │scid: C1           │           │
    │          └───────────────────┘           │
    │                                          │
    │          ┌───────────────────┐           │
    │          │Retry              │           │
  4.│◄─────────┤token: [S1]        │◄──────────┤
    │          │dcid: C1           │           │
    │          │scid: S2           │           │
    │          └───────────────────┘           │
    │                                          │
    │          ┌───────────────────┐           │
    │          │Initial            │           │
    │          │version: 1         │           │
  5.├─────────►│token: [S1]        ├──────────►│
    │          │dcid: S2           │           │
    │          │scid: C1           │           │
    │          └───────────────────┘           │
    │                                          │
    │          ┌───────────────────┐           │
    │          │Initial            │           │
    │          │version: 1         │           │
  6.│◄─────────┤token: []          │◄──────────┤
    │          │dcid: C1           │           │
    │          │scid: S3           │           │
    │          └───────────────────┘           │

CMCDragonkai commented 1 year ago

Ok with respect to the connection map. We are therefore always mapping DCID to the connections.

But due to connection migration, it is possible that multiple connection IDs may point to the same connection, but won't worry about that for the moment.

Now in our implementation, there's a double lookup of both the QUIC packet's DCID and also the derived connection ID. The reason is mainly due to the fact that in the middle of processing packets, its possible that the connection does already exist, but just under the name of the derived connection ID. Apparently this is due to potentially splitting ClientHello across multiple Initial QUIC packets.

See: https://github.com/cloudflare/quiche/commit/06c0d497a4e08da31e8d3684a7bcf03cca38448d

Ok so therefore, ultimately it's the client sent packet's DCID (which itself is derived from the server's SCID) which is used to identify the connection (this is called the "server-generated DCID" for identifying the server).

On a client-side perspective, using the received packet's DCID would mean that sometimes this DCID is actually our own client-generated DCID when we first connected to the server. When we create a client-side connection, we create a SCID randomly, and subsequent packets sent in response to this connection has the DCID equal to this.

So our connection map would then map EITHER:

Server generated DCID to point to server-side connections
Client generated DCID to point to client-side connections

And there would be no overlap... or exceedingly low probability of it occurring.

CMCDragonkai commented 1 year ago

In terms of demuxing the handleMessage. It is also possible to register multiple handlers for the message event. However this is not efficient because every time an message event occurs on the dgram socket, it ends up calling every single handler.

If we use event.stopImmediatePropagation() we can cancel the handling of subsequent handlers. But again it's still possible with a 100 QUIC clients, it would check all 100 clients before hitting the server.

So our demuxing logic instead happens within a single message handler, that instead checks a shared connection map to decide what to do depending on whether a server is registered or not, or whether the connection exists or not.

CMCDragonkai commented 1 year ago

So now I'm at the point where the QUICSocket can call into the QUICServer and ask it to handle a new connection, but only if it is registered as the server for the socket. Otherwise it just discards those kinds of packets.

It can also acquire an existing QUICConnection from the connection map if the connections DCID matches one of them.

However some decisions has to be made now.

Should the QUICServer or the QUICSocket be the one to put the connection into the map?
How should the quic socket push data into the connection? Should it handle it at the QUICSocket handleMessage, or is that something we delegate that to the QUICClient or QUICServer? I'm still not entirely sure what the QUICClient really does besides bootstrap a QUICConnection.
When identifying the QUICConnection, there doesn't seem to a way to identify whether it is a client or server connection, and if we identify the client, how to then access the client object too. Not sure if we even need this.

In the case of 1., we could say that neither should, that is instead the construction of a QUICConnection leads it to be put into the map, and correspondingly the destruction of the connection takes it out of the map. Creation and destruction can be approached with a CreateDestroy pattern where, asynchronous creation and asynchronous destruction.

CMCDragonkai commented 1 year ago

For the QUICConnection, I've made the 2 methods recv and send represent:

recv means bridging socket -> connection -> streams.
send means bridging socket <- connection <- streams.

This means QUICConnection.recv is supposed to be called by the socket, since that's where the data flows from.

The QUICConnection.send on the other hand is supposed to be called by the stream, since that's where the data flows from.

HOWEVER the QUICConnection.send may also have data flowing from other sources, such as just handshake protocol, so even in in the handle message of the socket, in can trigger QUICConnection.send.

So the control flow can come from other sources.

This means the send actually loops over the connection object and flushes all data to the socket.

This also means, there's some tight coupling between all these objects, since the socket injects itself into the server/client which itself injects it to the connection object when it creates it... etc. I've realised that there's no other way to do it, it has to be somewhat tightly coupled mechanism.

Finally the problem is errors, where do they go? Should they go to the caller? Or in the case where the caller isn't available or it's all just event driven, it seems errors has to goes to a generic error event (that is like a parent object listening for the error event). But in that case it goes against how we normally write things with promise based methods, where errors flow back up the control flow.

I'm still not entirely sure how to do it, so right now I'm doing both and going to figure out which is the best soon.

CMCDragonkai commented 1 year ago

This paper https://www.dsi.fceia.unr.edu.ar/downloads/informatica/info_III/eventexcep.pdf talks about exactly the issue I'm dealing with right now. How to manage errors from methods are basically "event handlers". The composition of these objects is not strictly hierarchical. The event invoker isn't actually really more capable of handling these errors, and especially with a promise-API, having the invoker discard errors results in ugly line noise of await p.catch(() => {}); (and in some cases, there should be exceptions where something was written incorrectly.

CMCDragonkai commented 1 year ago

Ok the QUICConnection now works against 3 possible events: send, recv and timeout. These 3 events drive all interactions that could occur with a connection. The send results in a UDP socket event. The recv results in stream events. The timeout ends up closing the connection eventually due to draining timer, idle timer or path timer.

I think on server side created connections, they don't have a timeout until the first recv call occurs (which happens immediate for newly created server-side connections).

Errors don't go to the caller, there's a no-exception guarantee, and instead they get emitted. Of course this is for "expected exceptions", anything unexpected will still be thrown up, but those would be considered programmer errors, not runtime errors.

Now we proceed back to testing the QUICSocket, QUICServer, QUICConnection and QUICStream before going back to QUICClient.

CMCDragonkai commented 1 year ago

Cool, it is sort of working right now. I can see the timeouts being set, but it's not really cleaning up the connection just yet.

INFO:QUICServer:Starting QUICServer on 127.0.0.1:55555
INFO:QUICSocket:Starting QUICSocket on 127.0.0.1:55555
INFO:QUICSocket:Started QUICSocket on 127.0.0.1:55555
INFO:QUICServer:Started QUICServer on 127.0.0.1:55555
DEBUG:QUICServer:QUIC packet version is not supported, performing version negotiation
DEBUG:QUICServer:Send VersionNegotiation packet to 127.0.0.1:34101
DEBUG:QUICServer:Sent VersionNegotiation packet to 127.0.0.1:34101
DEBUG:QUICServer:Send Retry packet to 127.0.0.1:34101
DEBUG:QUICServer:Sent Retry packet to 127.0.0.1:34101
DEBUG:QUICServer:Accepting new connection from QUIC packet
INFO:QUICConnection ad0c415a56fd5c885fd2a35a2fa9f2580932122a:Creating QUICConnection
EVER TIMEOUT null
INFO:QUICConnection ad0c415a56fd5c885fd2a35a2fa9f2580932122a:Created QUICConnection
got the connection QUICConnection {}
EVER TIMEOUT 4999
EVER TIMEOUT 998
EVER TIMEOUT 4997
EVER TIMEOUT 28
EVER TIMEOUT 4999
EVER TIMEOUT 4999
EVER TIMEOUT null
^CINFO:QUICServer:Stopping QUICServer on 127.0.0.1:55555
INFO:QUICConnection ad0c415a56fd5c885fd2a35a2fa9f2580932122a:Destroying QUICConnection
INFO:QUICConnection ad0c415a56fd5c885fd2a35a2fa9f2580932122a:Stopped QUICConnection
INFO:QUICSocket:Stopping QUICSocket on 127.0.0.1:55555
INFO:QUICSocket:Stopped QUICSocket on 127.0.0.1:55555
INFO:QUICServer:Stopped QUICServer on 127.0.0.1:55555

CMCDragonkai commented 1 year ago

Important that this library will only focus on QUIC, and not HTTP3.

CMCDragonkai commented 1 year ago

We need to investigate how the timers work and ensure that they are actually being timed out properly and then it should proceed to close the connection if there's no response on the other side.

CMCDragonkai commented 1 year ago

@tegefaulkes regarding 14.

[ ] Propagate the rinfo from the UDP datagram into the conn.recv() so that the streams (either during construction or otherwise) can have its rinfo updated. Perhaps we can just "set" the rinfo properties of the connection every time we do a conn.recv(). Or... we just mutate the conn parameters every time we receive a UDP packet.

We need to understand that under QUIC, it's possible for the same connection to have different remote host and remote port.

See: https://www.rfc-editor.org/rfc/rfc9000.html#name-connection-migration

Now I'm actually not entirely clear atm.

It's possible that when the client migrates to a new network path, that the dcid or scid changes.

I haven't handled this case yet in the QUICConnection class. That is, it would mean maintaining the same QUICConnection object instance (because we don't want to lose the streamMap state), and transitioning the connection ID.

At any case, this mean the remoteHost and remotePort could change at any point in time when querying the connection.

It's even possible that there could be multiple valid concurrent remote hosts and ports that are going to different streams on the same connection, I haven't confirmed this case yet.

Furthermore you're handling events on the stream itself, and you want to give it an object of remote host/port information at the point of handling that stream.

My current solution is to:

We put in the properties localHost, localPort, remoteHost and remotePort into QUICConnection, however the remoteHost and remotePort can be change at any time, so the information you are passing in the handler is only valid at that specific point in time where you are constructing the object. This may be sufficient for your usecase, since you are just using it for logging and nothing else.

CMCDragonkai commented 1 year ago

So now there is:

QUICConnection.localHost
QUICConnection.localPort
QUICConnection.remoteHost
QUICConnection.remotePort

The remoteHost and remotePort can change on every QUICConnection.recv invocation.

This still needs to be tested with connection migration. Atm only client connections can migrate, servers cannot migrate according to the QUIC spec.

Since every PK agent is both client and server, this makes it bit weird.

Migrating a connection to a new server address mid-connection is not supported by the version of QUIC specified in this document. If a client receives packets from a new server address when the client has not initiated a migration to that address, the client SHOULD discard these packets.

That means if a PK agent were to be migrating to new IP/port due to disconnection, then the servers would have to restart. This would be important on mobile networks, it's something we will think about when we get PK on the mobile phone. We don't actually need to do "live migration" for our PK agent, we can rely on our kademlia system to do this, but indeed all client connections would have to automatically disconnect and retry on the new address upon detecting this on the node graph.

CMCDragonkai commented 1 year ago

The config.setMaxIdleTimeout does control the initial timeout.

So on the server side, upon constructing the QUICConnection, the initial call to conn.timeout() gives back null.

However after the first conn.recv(), the next call gives back 6000 ms. (Actually 5999).

What's a bit confusing is that after calling conn.send(), the next call changes this to 1000ms (actually 998).

The max idle timeout parameter is explained as:

    /// Sets the `max_idle_timeout` transport parameter, in milliseconds.
    ///
    /// The default value is infinite, that is, no timeout is used.
    pub fn set_max_idle_timeout(&mut self, v: u64) {
        self.local_transport_params.max_idle_timeout = v;
    }

This parameter is actually exchanged with the client in order to get the lowest possible max idle timeout.

On the second time and third time we call it, we are checking if we are draining. In both cases this is false (probably because the connection isn't closed).

There's a check as to the lowest loss detection timer, and again compared with the idle timer.

This means on the second time it is called, the idle timer must win, and there probably isn't a loss detection timer. And thus we get 5000ms.

Then on the third time, there must be a loss detection timer set, and therefore the 1000ms is lower, and this is returned instead.

The config does not have anyway of setting this loss detection timer, this must be set automatically.

                .filter_map(|(_, p)| p.recovery.loss_detection_timer())

The loss detection timer is set inside the Recovery struct.

It appears to be set by a function Recovery::set_loss_detection_timer.

This feels like this is called by the Connection::send. Note that the loss_detection_timer is a Option<Instant>.

I suspect this loss detection timer has something to do with: https://www.rfc-editor.org/rfc/rfc9000.html#name-loss-detection-and-congesti.

This would make sense that the loss detection time may be something that is lower than the idle timeout, since we are more concerned if some packet was lost, and therefore probably needs to be resent.

From ChatGPT:

The draining timer is used to ensure that all data has been sent on a connection before it is closed. The idle timer is used to close a connection if no data is sent or received for a certain period of time. The loss detection timer is used to detect when packets have been lost in transit and retransmit them if necessary. All 3 timers are used to manage the state of a connection and ensure reliable communication.

It does make sense that we would reset the timer upon additional events.

CMCDragonkai commented 1 year ago

Ok so when the connection times out, and it does eventually. The handler is in fact called.

Now I put in:

          console.log('draining', this.conn.isDraining());
          console.log('closed', this.conn.isClosed());
          console.log('timed out', this.conn.isTimedOut());
          console.log('established', this.conn.isEstablished());
          console.log('in early data', this.conn.isInEarlyData());
          console.log('resumed', this.conn.isResumed());

Before and after calling this.conn.onTimeout().

I can see something like this:

TIMEOUT HANDLER CALLED
draining false
closed false
timed out false
established true
in early data false
resumed false
AFTER ON TIMEOUT
draining false
closed true
timed out true
established true
in early data false
resumed false

It is interesting here that upon calling onTimeout() the connection is closed. But the draining remains false.

This can mean that if it is already closed, the draining does not remain true even if draining were to be the case after which a timeout occurs and then the closing happens.

Anyway this leads us to call this.send() again just in case timing out means that we are draining now.

In the send, the finally clause kicks in and here it goes to run setTimeout again.

At this point we do re-run the setTimeout before then checking if the connection is closing or draining.

Now 2 things happen here, the next call to the this.conn.timeout is null, this is because the connection is already closed, therefore no timeout is further necessary. However I imagine that if instead the connection was draining, another timeout is necessary.

The problem was that I was checking that the status is not in destroying and that the connection was isClosed() && isDraining(), that's obviously not possible now that we know what happens inbetween onTimeout().

So I've changed this to isClosed() || isDraining().

CMCDragonkai commented 1 year ago

This fixes the problem and now the connection is infact destroyed after the timeout.

However I think this revealed another problem. Which is that suppose it was draining instead of being closed.

When we call destroy, this section will run:

    if (!this.conn.isClosed()) {
      // The `recv`, `send`, `timeout`, and `on_timeout` should continue to be called
      const { p: closeP, resolveP: resolveCloseP } = utils.promise();
      this.resolveCloseP = resolveCloseP;
      await Promise.all([
        this.send(),
        closeP
      ]);
    }

Which means it blocks on the closeP promise while running send() simultaneously.

Now 2 things could happen here.

The original timeout that was set before calling destroy() might be running, it might be the draining timer. That draining timer could activate, which then triggers the onTimeout() which would close the connection, which then results in a call to QUICConnection.send which should then resolve the closeP and then allow the destruction to complete.

Alternatively the send runs, the connection would still not be closed, it ends up flushing the data to the socket.

Now if the data is all completely flushed in the socket, that could mean the socket is in fact closed now.

If it runs setTimeout() this may then clear the timeout because no more timers are needed, and then because the status is destroying that ends the send() promise, but nothing is resolving the closeP.

Which means we now have a blocked promise.

So this is a potential problem.

It seems the solution here is that we need to move the resolveCloseP to a different location.

CMCDragonkai commented 1 year ago

On solution is something like this:

      if (
        this[status] !== 'destroying' &&
        (this.conn.isClosed() || this.conn.isDraining())
      ) {
        await this.destroy();
      } else if (
        this[status] === 'destroying' &&
        (this.conn.isClosed() && this.resolveCloseP != null)
      ) {
        // If we flushed the draining, then this is what will happen
        this.resolveCloseP();
      }

But I need more testing on flushing situations so we can see how it behaves.

CMCDragonkai commented 1 year ago

Tagging @tegefaulkes to keep up with this progress.

CMCDragonkai commented 1 year ago

@tegefaulkes I'm testing the handling of streams atm.

It looks like:

const writer = stream.writable.getWriter();

for await (const read of stream.readable) {
  console.log(read);
}

await writer.ready;
await writer.write(Buffer.from('Hello World'));

await writer.ready;
writer.releaseLock();

await stream.destroy();

Now I have some questions...

It seems that a "writer" is a locked reference to the writable stream. It's a way of maintaining control/ownership of the stream, such that only 1 writer can be the one using the stream.

When we call await writer.close(), and later inside stream.destroy it calls await this.writable.close(), there's an error TypeError [ERR_INVALID_STATE]: Invalid state: WritableStream is closed. How are we supposed to close a writable stream, if there's a writer that still exists? Are we meant to check that if the writer is still locked (that is with an active writer), therefore it cannot be destroyed?
In your RPC handlers where you use the async generator syntax, what do you actually do when you finish handling? In my case, I'm thinking that this.readable.cancel() then await this.writable.close() is the right order of events. But now the issue is that how do you co-ordinate that with reader/writer objects?
What does it really mean to do writer.releaseLock()? Is there actually meant to be multiple potential writers to the same stream?

Previous Next

MatrixAI / js-quic

Create QUIC library that can be exposed to JS and uses the Node `dgram` module #1

Specification

Additional context

QUIC and NAPI-RS

Sub issues

2

3

4

5

6

7

8

9

10

13

14

15

16

17

18

Tasks