fails-components / webtransport

Http/3 webtransport support for node
Other
132 stars 20 forks source link

Support query strings? #279

Closed achingbrain closed 2 months ago

achingbrain commented 3 months ago

If I connect to a WebTransport server from node using the WebTransport implementation from this module with a query:

const wt = new WebTransport('http://example.com/session_with_query?foo=bar')

..the server does not receive the ?foo=bar part of the URL, only the /session_with_query part which means you can't connect to servers that expect/require query args.

It seems to be because only the .pathname property of the URL is used:

https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/main/lib/webtransport.node.js#L80

If it's changed to:

.then(() => client.createWTSession(sessionint, `${ourl.pathname}${ourl.search ?? ''}`))

..then the remote receives the query string.


From the server perspective, if the client with the patch above connects to the URL with the query string, this server session read stream does not yield the session:

for await (const session of getReaderStream(
  server.sessionStream('/session_with_query')
)) {
  // handle session
}

...instead you have to register a path that includes the query:

for await (const session of getReaderStream(
  server.sessionStream('/session_with_query?foo=bar')
)) {
  // handle session
}

This isn't very flexible for obvious reasons.

Could the WebTransport Session on the server side have a .url property similar to message.url from http or request.url from http2?

Then it could be up to the js server to parse the query string and handle the request.

I'd started working on a PR but it seems we keep a list of registered paths in js and in c++ so it wasn't as straightforward as I had hoped.

martenrichter commented 3 months ago

Wait. A year ago, I built an interface just for you on another issue. I have to look it up.

martenrichter commented 3 months ago

I did not find the issue, but maybe you find it. You can set on the server setRequestCallback( a request callback, where you can process all incoming sessions and also process the header fields and deny. If I remember correctly, it may be possible to add additional header fields and also set the path for reception. Anyway, I did not test it as the demand was not there. But test it debug, and then if it is stable, it would make sense to add some documentation.

martenrichter commented 3 months ago

Here is the old patch: https://github.com/fails-components/webtransport/commit/66b069f1add0c2c30b215a947fc412edec5b9798 It was before the great code reorganization, and it lacks unit tests as it is an undocumented (may be broken) feature.

martenrichter commented 3 months ago

If I connect to a WebTransport server from node using the WebTransport implementation from this module with a query:

const wt = new WebTransport('http://example.com/session_with_query?foo=bar')

Regarding this, we have to check how this is handled for http2/http3 and their header.

achingbrain commented 3 months ago

The C++ code that invokes the callback passed to setRequestCallback doesn't create an Http3WTSession object so it's not clear how this helps me access query string arguments?

https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/transports/http3-quiche/src/http3serverbackend.cc#L70-L90

Anyway, I've opened #280 which implements query string support, hopefully it's useful.

martenrichter commented 3 months ago

The important part is here: https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/transports/http3-quiche/src/http3serverbackend.cc#L82 This one is invoked if you set a callback. The callback is called with a header object. You can then parse the query string and deny the request early, which can be handy if someone attacks the server. You do this by setting the status field to a value different from 200: https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/transports/http3-quiche/src/http3server.cc#L503 Here is the logic on theJSs side: https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/main/lib/server.js#L247 As you can see, the path property of the returned object sets the stream on which the object is returned. So you can strip the query part. The parsed results can also be attached to the header object, which is attached to the session object here: https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/transports/http3-quiche/src/http3server.cc#L378 . That is all you need for a lot of different use cases.

martenrichter commented 3 months ago

Ok, here is an untested example:

server.setRequestCallback(({header}) => {
    const path = header[':path']
    const pathWithoutQuery = path.split('?')[0]
    if (path === 'yourdesiredpath') {
      const newheader = {...header}
       newheader['queryargument'] = path.split('?')[1]
       return { path: 'iwantthis',  header: newheader, status: 200}
   }
  return {status: 404}
})

And then you can get it with:

for await (const session of getReaderStream(
  server.sessionStream('iwantthis')
)) {
  // handle session
  const queryargument = session.header[queryargument]
}

I hope this makes the mechanism clear. The code is untested as I am not on my development machine, but you will figure out minor flaws. In this way, the handler acts as a middleware, similar to parsers in express, but it is not chainable. I designed it a year or more ago after looking into your libp2p protocol details, but also with other uses in mind.

achingbrain commented 3 months ago

Aha, I see what you mean now. I was confused because when the path doesn't match, although the server backend calls through to JS to invoke the callback:

https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/transports/http3-quiche/src/http3serverbackend.cc#L88

...the actual calling method doesn't do anything with the response from the callback:

https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/transports/http3-quiche/src/http3server.cc#L423-L425

It's also worth noting that you still need the changes to webtransport.browser.js and webtransport.node.js from #280 to send the query string in the first place.

I would say that setting a bogus header seems a bit weird - the user didn't send this, it feels like a bit of a hack that takes advantage of the headers being mutable.

Making the server behave a bit more like a web server and supporting query strings seems like a better solution to me, though it's your module so up to you.

martenrichter commented 3 months ago

Aha, I see what you mean now. I was confused because when the path doesn't match, although the server backend calls through to JS to invoke the callback:

https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/transports/http3-quiche/src/http3serverbackend.cc#L88

...the actual calling method doesn't do anything with the response from the callback:

https://github.com/fails-components/webtransport/blob/b7ef52877546f7b0597ba31b41d70554bf8728e6/transports/http3-quiche/src/http3server.cc#L423-L425 Well, it does through a promise-like class I wrote for C++; I needed this to tap into the libquiche handling code. And yes this js style code in C++ is weird.

It's also worth noting that you still need the changes to webtransport.browser.js and webtransport.node.js from #280 to send the query string in the first place.

Sure, I have to look into the RFCs, if the query part is also part of the path in the header (I do not know for sure). But I assume it does, and then of course this will be changed.

I would say that setting a bogus header seems a bit weird - the user didn't send this, it feels like a bit of a hack that takes advantage of the headers being mutable.

I would be open to adding another object passed down from the callback to the session, where you can transport user data from the callback. (userData ?) I copied this style, as I said, with express middleware, which also attaches and parses various things in the objects to pass data.

Making the server behave a bit more like a web server and supporting query strings seems like a better solution to me, though it's your module so up to you.

But the downside would be that I have to implement for every possible scheme in every transport different code that I have to support if changes occur. (Think of other dynamic URLs that encode data in the paths). With the callback, every user can extend it in a way that fits the application, and I do not have to modify the module.

martenrichter commented 3 months ago

https://github.com/fails-components/webtransport/pull/281 Untested..., but will allow sending the query and userData as a replacement for the header.

achingbrain commented 3 months ago

Broken, I'm afraid - args.object is undefined. I think you need to set an "object" property on retObj that references the Http3ServerJS C++ instance in processNewSessionRequest, or it might need to be this.transportInt.finishSessionRequest as it was previously?

martenrichter commented 3 months ago

Ok, this was probably broken when the whole project was restructured. The solution is probably easy. Just replace args.object with this.transportInt. I hope I remember it correctly. (I am away from my dev computer and can not test.)

achingbrain commented 3 months ago

Trying to use the request callback seems to break all kinds of stuff. Requests that hit the callback time out instead of connecting successfully (or 404ing, depending on the test).

I've opened a PR that adds a request callback and a test for the user data object here - https://github.com/fails-components/webtransport/pull/282

Am I using it wrong?

achingbrain commented 3 months ago

Another thing is that if the aim is for the request callback to be like Express middleware, it should get invoked on every request (or, at least, during predictable parts of the request lifecycle), not just the ones that it can't match a path for. This way the user can ensure that the application receives modified requests in a consistent manner.

At the moment you have to send a query string or make a request that would otherwise 404 in order for the callback to be invoked.

martenrichter commented 3 months ago

Ah, so the PR does not work. No, it looks as if it is used as intended. But it was written before I changed a lot of stuff (and as no one was using it). I will look into the test branch tomorrow and fix it. (I assume the unit test will fail).

martenrichter commented 3 months ago

Another thing is that if the aim is for the request callback to be like Express middleware, it should get invoked on every request (or, at least, during predictable parts of the request lifecycle), not just the ones that it can't match a path for. This way the user can ensure that the application receives modified requests in a consistent manner.

At the moment you have to send a query string or make a request that would otherwise 404 in order for the callback to be invoked.

Well, this can be changed in the http3serversession stream so that if the session request is present, it will always handle everything. But if it is not, it will use the old paths mechanism to remain compatible.

martenrichter commented 3 months ago

Regarding the tests, as the other tests also fail, it indicates that the server process has an exception. Try adding a try-catch block on the server side and printing the error; then, you will see where the problem is.

martenrichter commented 2 months ago

I am closing this, as the fix for the mechanism has been out for a while. I assume it is working. If not, post again, and we can reopen the issue.