Open bradfitz opened 7 years ago
Triage, @bradfitz?
Thanks. Yeah, we could still do this sometime. It's just lowish priority, since I've never seen one in the wild and the spec only says we "MAY" retry the request.
http://httpwg.org/specs/rfc7540.html#rfc.section.9.1.2
/cc @tombergan
@bradfitz
The title of this issue puts the retry responsibility in the hands of Transport.
I don't see how the transport code could handle a retry. According to the RoundTripper
, "RoundTrip should not attempt to handle higher-level protocol details such as redirects, authentication, or cookies." Though a redirect is not the same as a retry, they are at a similar abstraction level. I.e., callers of transport's round tripper would probably not expect that there was an intermediate 421 response before an additional request. I know there is a kind of precedence here with acting on 100 responses. Maybe you're saying that despite 4xx being client errors, 421 should be on the level of 1xx in terms of client's expectation of involvement.
(There's also the language of "should" vs "must". The quote above uses should, whereas a prior sentence in that godoc RoundTripper paragraph, uses "must" regarding error responses - "...RoundTrip must return err == nil..." Therefore, the restriction is not an absolute requirement.)
I'm new here so please let me know if I'm missing something.
There is more detail in RFC 7838. I believe what can happen is:
RFC 7838 Section 6 says that the Alt-Svc cache entry MUST be cleared, then the request MAY be retried:
Clients receiving 421 (Misdirected Request) from an alternative service MUST remove the corresponding entry from its alternative service cache (see Section 2.2) for that origin. Regardless of the idempotency of the request method, they MAY retry the request, either at another alternative server, or at the origin.
Since this is a MAY, I tend to agree with @meirf that RoundTrip should not automatically retry. Instead, the implementation can remove foo.cdn.com from the Alt-Svc cache to guarantee that a subsequent retry will use foo.com.
However, RFC 7540 Section 9.1.2 says:
[Receiving 421] is possible if a connection is reused (Section 9.1.1)
I really don't understand why this would happen -- isn't that what GOAWAY is for? In any case, I think the http2 client library could handle this by treating 421 as GOAWAY(NOERROR).
I really don't understand why this would happen
Here's an example of 421 because of connection reuse.
isn't that what GOAWAY is for?
I think GOAWAY is too extreme for reuse-caused 421s. Specifically, 6.8 says:
The GOAWAY frame (type=0x7) is used to initiate shutdown of a connection or to signal serious error conditions...Receivers of a GOAWAY frame MUST NOT open additional streams on the connection, although a new connection can be established for new streams...The GOAWAY frame applies to the connection, not a specific stream.
GOAWAY prevents any additional steams, but 421 (at least for connection reuse) indicates that the connection can be used again - just not for requests destined for the problematic host.
I think the http2 client library could handle this by treating 421 as GOAWAY(NOERROR)
Indeed, we want the request to be retried on a different connection just as we do for GOAWAY(NOERROR). We agree that http.Transport
should not be retrying as discussed above. So http.Client
might have a boolean field saying retry on 421, which defaults to false. The client has a Transport
field which is of type RoundTripper
. A 421 is received and the client knows it should tell the transport to try the request on a new/different connection. But the RoundTripper interface in Client only takes a Request - no other options. Changing the API is probably not what we want. There is a RoundTripOpt
in x/net/http2
which takes options and is bundled into h2_bundle.go. But RoundTrip
calls into RoundTripOpt
not vice versa.
I'd love to submit some code for this, but I don't see how it would work. I'm open to any ideas/corrections on what I've written here or above.
@meirf, thanks for the virtual hosts example. I knew about 421 with Alt-Svc, but not virtual hosts. Before diving into a solution, I want to make sure we completely understand the problem. Specifically, I want to understand all the ways a client may decide to send a request to a server that will return 421. I believe there are (at least) three ways this can happen:
Via Alt-Svc (RFC 7838), the client previously discovered that foo.com is also available at cdn.com. The client happens to have an open connection to cdn.com, so it sends requests to foo.com on that connection rather than opening a new connection.
The client has a connection open to www.foo.com. The TLS cert for that connection names multiple hosts (perhaps using a wildcard), including www1.foo.com. The client recognizes this, so it sends requests for www1.foo.com over the same connection.
foo.com is served by three distinct IPs, each of which are virtual hosts. One of those IPs is misconfigured and will return 421 for all foo.com requests. The client sees all three IPs in the DNS record for foo.com. If the client unluckily selects the wrong IP, it will get a 421.
I believe 1 and 2 cannot happen in the current Go library because Go does not implement Alt-Svc or connection sharing when advertised by TLS certs (@bradfitz, correct me if I'm wrong). The third case can happen, but unfortunately, this is trickier to fix because we'd need to blacklist specific IPs in the dialer so we don't end up with the same IP when redialing a new connection.
Does that sound right?
Thanks for bringing up the status quo: "I believe... connection."
I think it's worthwhile to go back to 9.1.2. The (unfortunately vague) gist is
the request was directed at a server that is not able to produce a response
It then lists 3 categories of cases when this might happen.
Case A
server that is not configured to produce responses for the combination of scheme and authority that are included in the request URI
Case B
a connection is reused (Section 9.1.1)
Case C
an alternative service is selected [ALT-SVC].
Cases B & C are the only cases referred to in the context of retrying. (For A, we would never want to retry the request because it would certainly fail.) For examples of B, the spec describes one with a middlebox. As another example, I referenced one above with virtual hosts that had different ssl configurations. For C, 7838 says that this happens when "an alternative service is not authoritative". One way to be authoritative is to present a TLS certificate that’s valid for the origin . 7838 also says "they MAY retry the request, either at another alternative server, or at the origin."
Going back to the 3 cases you listed.
Go does not implement Alt-Svc.
If it does not implement Alt-Svc/there is no plan to, then we can rule out Case C.
Go does not implement ... connection sharing when advertised by TLS certs.
I think this is correct - connection sharing is done only on the basis of authority (host:port). Though there is a currently a "TODO: add support for sharing conns based on cert names". The spec's only example of 421 in case B and the only cases I've seen involve different authorities that are covered by a single cert. Because of that, I assume there are currently no cases of 421 occurring in Case B with the current implementation.
The third case can happen
I don't see how the third case could happen. The vhosts have different authorities. Additionally, it requires connection reuse based on cert because it's https.
If Cases A, B and C are exhaustive and my assumption ("...there are currently no cases...") is correct, then it looks like retrying would accomplish nothing right now. It would only be valuable once alt-svc or sharing conns based on cert names was implemented. Even at that point, we would still have to address the issue I brought up way earlier - "retry responsibility".
Having just experienced a problem related to this in real life - I do think Go should handle it in the transport.
However, it specifically should handle it when: 1) a request is made on an existing shared connection 2) you receive a 421
And then the request should then be made on a new connection - not a shared one.
If the connection was received when sent on its own connection, pass it up to the client. The author of the server may have been creatively using http codes that the client may know how to handle.
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?What did you do?
If possible, provide a recipe for reproducing the error. A complete runnable program is good. A link on play.golang.org is best.
What did you expect to see?
What did you see instead?