bytecodealliance / wasmtime

A fast and secure runtime for WebAssembly
https://wasmtime.dev/
Apache License 2.0
15.2k stars 1.28k forks source link

What should the default behavior of `wasmtime serve` be with scheme/authority? #8878

Closed alexcrichton closed 2 months ago

alexcrichton commented 3 months ago

My changes in https://github.com/bytecodealliance/wasmtime/pull/8861 introduced a change in the default behavior of wasmtime serve. Notably this program:

use wasi::http::types::*;

struct T;

wasi::http::proxy::export!(T);

impl wasi::exports::wasi::http::incoming_handler::Guest for T {
    fn handle(request: IncomingRequest, outparam: ResponseOutparam) {
        println!("request.method = {:?}", request.method());
        println!("request.scheme = {:?}", request.scheme());
        println!("request.authority = {:?}", request.authority());

        let resp = OutgoingResponse::new(Fields::new());
        ResponseOutparam::set(outparam, Ok(resp));
    }
}

(compiled component)

When run with wasmtime serve and hit with curl http://localhost:8080 it prints:

Serving HTTP on http://0.0.0.0:8080/
stdout [0] :: request.method = Method::Get
stdout [0] :: request.scheme = Some(Scheme::Http)
stdout [0] :: request.authority = Some("localhost:8080")

On main, however, it prints

Serving HTTP on http://0.0.0.0:8080/
stdout [0] :: request.method = Method::Get
stdout [0] :: request.scheme = None
stdout [0] :: request.authority = None

This regression is due to these changes because I didn't understand what they were doing.

Now why wasn't this caught by the test suite? I tried writing a test for this and it passed, but apparently it's due to our usage of hyper::Request::builder().uri("http://localhost/") in the test suite. That creates an HTTP requests that looks like:

GET http://localhost/ HTTP/1.1
...

where using curl on the command line generates:

GET / HTTP/1.1
...

That leads me to this issue. What should scheme and authority report in these two cases for wasmtime serve by default? The previous behavior means that GET / could not be distinguished from GET http://localhost/ which naively seems like what scheme and authority are trying to map to.

Is the previous behavior of wasmtime serve buggy? Is the current behavior buggy? Should the spec be clarified?

cc @elliottt @pchickey

alexcrichton commented 3 months ago

I'll note that the difference can be seen with:

$ curl -v --request-target http://localhost/ http://localhost:8080

vs

$ curl -v http://localhost:8080

in terms of how the headers are set. The former is basically what our test suite does while the latter is what the curl command line does by default.

lann commented 3 months ago

The former (full URL in first line of the request) is incorrect; that form is only applicable to the CONNECT method and HTTP/1.0 iirc.

lann commented 3 months ago

The former (full URL in request) is incorrect; that form is only applicable to CONNECT methods.

Edit: looks like I might be wrong about it being incorrect per se; it might just be very uncommon.

https://www.rfc-editor.org/rfc/rfc2616#section-5.1.2

alexcrichton commented 3 months ago

That makes sense, and means we should probably update our tests, but I guess I'm also curious still what the behavior here should be. For example why do scheme and authority return an Option at the WIT level? Are they intended to map to this or is it expected that they're effectively always Some?

lann commented 3 months ago

scheme is derived from out of band info: whether the request came in over TLS or not. authority was optional in http/1.0 which I believe is why hyper makes it optional. I don't know if that is the reason here though.

tschneidereit commented 3 months ago

my take is that we should have wasi:http either always provide an authority, or provide the host header. Otherwise there's no standard way for content to learn about the authority it's called under. I know that @lukewagner concluded that we must never provide the host header. If that stands, I conversely think that we must continue providing the authority for incoming requests.

In effect, that means that for incoming requests what content sees is always the absoluteURI form, which the RFC seems to indicate is the way forward, too:

To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies.

lann commented 3 months ago

Don't pay too much attention to the 1.1 spec for future direction. HTTP/2 and /3 make authority even more special by (usually) splitting it into a "pseudo-header": https://httpwg.org/http-core/draft-ietf-httpbis-semantics-latest.html#field.host

alexcrichton commented 3 months ago

The previous logic for scheme and authority were:

Given that wasmtime serve does not itself support HTTPS, should the scheme always be HTTP in this case regardless of what the "absolute URI" says? Similarly wasmtime serve has no support for multi-domain hosting or anything like that so is it correct to lookup the authority based on the request?

A related but orthogonal question is that right now the in-memory store for this information is hyper::Request<T> and I'm not sure if that's the right choice here either.

Basically I'm still personally very confused about what the desired behavior is and how we would go about implementing it.

lukewagner commented 3 months ago

Great question. First of all, from asking some HTTP folks, I believe it is the case that we could tighten the spec wording to say that for methods other than CONNECT and OPTIONS, there must always be an authority (i.e., the return value is some). Apparently, CONNECT and OPTIONS have an unfortunate * option that simply has no authority.

Next, my understanding from RFC 9110 is that the authority either comes from the :authority pseudo-header in H/2/3 or the Host header in H/1, and if both are present, they are not allowed to disagree (and this is Web-compatible). Thus, I think what WASI HTTP should say is that:

This allows the host implementation to do the transport-appropriate thing for requests coming in or out over the wire.

alexcrichton commented 2 months ago

Ok so for the use case of wasmtime serve specifically:

This all indicates to me that new_incoming_request should take both a scheme and an authority argument rather than inferring these from the Request as well?

(sorry I'm really looking for guidance/second opinions here, I don't know why things were originally constructed the way they are or if they're just how things got shaken out)

lann commented 2 months ago

otherwise it returns .... None?

Do we want/need to support HTTP/1.0? afaik host is mandatory for 1.1; we could just 400 if a request has no host/authority.

Alternatively, it appears that it is valid for host to be empty, so an HTTP/1.0 request without host could just imply that.

alexcrichton commented 2 months ago

Ah ok I didn't realize that was a 1.0 thing. Sounds like it should search for Host as a header in the host-side hyper::Request for the authority if it's not present in the URI. I try to make a PR with these changes tomorrow.

lukewagner commented 2 months ago

The impression that I got talking to an HTTP server maintainer is that it's web-compatible to require the Host field (rejecting if it's absent in HTTP 1.0 or 1.1) and that allowing an empty authority in cases other than CONNECT/OPTIONS can transitively lead to security issues (random googling found this e.g.). Similarly, RFC 9110 (which also intends to be Web-compatible) says that there MUST be a Host header (when there is no :authority pseudo-header) without making an exception for HTTP 1.0. Thus, I'd suggest making it an error in wasmtime serve if there is no Host in HTTP 1.0, at least to start with, and see if anyone complains.