envoyproxy / envoy

Cloud-native high-performance edge/middle/service proxy
https://www.envoyproxy.io
Apache License 2.0
25.04k stars 4.82k forks source link

Envoy HTTP2 Connection sends RST_STREAM before initial HEADERS which causes upstream server to close connection with PROTOCOL_ERROR #11369

Open tan-stripe opened 4 years ago

tan-stripe commented 4 years ago

Description

Preface: we’re using envoy as an L7 proxy with a HTTP2 upstream. We’ve noticed that sometimes Envoy sometimes sends a RST_STREAM before actually sending the HEADERS frame to change the stream state from idle to open.

The following is a pcap of a single TCP connection between envoy and the upstream server. The upstream server is listening on 8080 while the ephemeral port that envoy cx is using is 63692.

Pcap file

image

On #243, we can see that envoy starts sending RST_STREAM for streams that haven’t received HEADERS or PUSH-PROMISE frames. Additionally, some of these stream resets are repeated (e.g. stream id 27, 23 from above). Because of this the server closes the connection with a PROTOCOL_ERROR as per HTTP2 spec.

Excerpt:

RST_STREAM frames MUST NOT be sent for a stream in the "idle" state. If a RST_STREAM frame identifying an idle stream is received, the recipient MUST treat this as a connection error (Section 5.4.1) of type PROTOCOL_ERROR.

This leads to envoy dropping in-flight requests on the same HTTP2 connection with a 503 UC. We believe this is because of the way we do deferred reset after reading this code snippet. In other words, we believe that from envoy’s prospective, this stream exists, but since the first frame hasn’t made it to the server yet, the server believes that the stream is idle and does an active close.

Repro Steps

A very contrived example can be done using a single upstream go service coupled with a load tester and envoy.

Version details:

Configuration:

Steps:

  1. Start up a simple HTTP2 server with some delay in the response
  2. Start envoy with given configuration
  3. Run a hey loadtest - hey -n 50 -c 50 -h2 -t 3 http://localhost:10000
    • You may have to tune the parameters a little bit, but we were able to reliably cause an error message on the go program side. Error: http2: server connection error from 127.0.0.1:65100: connection error: PROTOCOL_ERROR
    • This error comes from receiving a RST_STREAM on an idle stream
    • Any connection on the same TCP connection gets dropped due to a TCP-FIN initiated by the go server
snowp commented 4 years ago

Sounds like we should be tracking whether we've sent anything over the streams (either in the deferred reset code, in ConnectionImpl::StreamImpl::resetStreamWorker or in nghttp2 proper) and conditionally issue a RST_STREAM based on that.

WDYT @mattklein123 @alyssawilk

mattklein123 commented 4 years ago

Hmm, this is an interesting one. This is a bit deeper in the spec than I am familiar with. I don't think it's as simple as tracking the state, as I think the only way we can get into this situation is that we have already submitted headers, but not data. What I think is happening is that nghttp2 is dropping the headers frame when it sees the reset frame. I think it's possible this would need to be fixed in nghttp2, but it's also possible we are doing something wrong here.

cc @PiotrSikora @antoniovicente @tatsuhiro-t

tatsuhiro-t commented 4 years ago

nghttp2 cancels pending HEADERS frame if it sees RST_STREAM referencing it. In this case, RST_STREAM itself is not sent in this case.

https://github.com/nghttp2/nghttp2/blob/ef41583614e95efd12b6cce821e34837c1b28ed0/lib/nghttp2_session.c#L956-L957

It is true that nghttp2 does not do any effort not to send RST_STREAM to idle stream unless the above code finds those HEADERS. If the above code works as it is intended, and envoy only submits RST_STREAM after it submits HEADERS, RST_STREAM will not be sent as far as I understand it. I can change the code so that RST_STREAM is not sent to the idle stream in the future nghttp2 release.

mattklein123 commented 4 years ago

If the above code works as it is intended, and envoy only submits RST_STREAM after it submits HEADERS, RST_STREAM will not be sent as far as I understand it.

I will review the code again but I'm pretty sure in the deferred reset case we only ever submit reset after submitting headers. It's possible there is some sequence we are missing here. I will try to audit the code unless @alyssawilk gets to it first.

abhayyb commented 4 years ago

I am also facing this issue with istio-proxy within our K8s cluster. Envoy version : 1.12.0

The server container sends a GOAWAY to istio-proxy on receiving RST_STREAM for idle stream. We use libmicrohttpd for HTTP server.

GOAWAY

Additional Debug Data contents : RST_STREAM: stream in idle

All the pending requests on this connection are responded with 503 UC. Strangely this is not seen when the incoming traffic rate is high. After the rate reduces, this is seen. Seems like there is an issue when streams are getting terminated.

alyssawilk commented 4 years ago

tl;dr at first glance I don't see a way we can create http2 streams without immediately sending headers to nghttp2

Looking at Envoy code in the happy path:

UpstreamRequest::encodeHeaders calls HttpConnPool::newStream calls ConnPoolImplBase::newStream. If there is a connection ready, ConnPoolImplBase::attachRequestToClient calls --ConnPoolImpl::ActiveClient::newStreamEncoder calls ClientConnectionImpl::newStream to create the nghttp2 stream via nghttp2_session_client_new2 -- HttpConnPool::onPoolReady calls UpstreamRequest::onPoolReady calls HttpUpstream::encodeHeaders sends the headers via nghttp2_submit_request

My suspicition was there was a bug if we don't have a prefetched connection. In that case we defer attachRequestToClient. Once the connection is ready, ConnPoolImplBase::onUpstreamReady calls ConnPoolImplBase::attachRequestToClient but attachRequestToClient doesn't call newStreamEncoder unless there are pending requests.

it looks like we don't do newStreamEncoder (creating the nghttp2 stream) unless we have headers ready to ship, and then we always call onPoolReady which always calls encodeHeaders - I don't see any early returns on this path.

note we don't track if the headers have been sent via onFrameSend, but we do sendPendingFrames, which does nghttp2_session_send. I don't see where pending_deferredreset can happen before the headers have been encoded (though maybe Matt or trace logs will find I'm missing something)

mattklein123 commented 4 years ago

tl;dr at first glance I don't see a way we can create http2 streams without immediately sending headers to nghttp2

Yeah this was my assessment also but I will look again this week.

clundquist-stripe commented 4 years ago

Just wanted to check on this issue as it has been a while. We're eager for this fix, and want to know if we should explore some other work arounds if it won't make it in 1.17

mattklein123 commented 4 years ago

We're eager for this fix, and want to know if we should explore some other work arounds if it won't make it in 1.17

Unfortunately from code inspection we have no idea how this can happen. Without a repro it's going to be very hard to fix this. Are you able to easily repro this? Can you come up with a self-contained docker compose repro? If not, is it easy enough to repro that you could capture trace logs of a repro?

clundquist-stripe commented 4 years ago

Oh, are the repro steps from Tan's post not working? I can dig into it if that's the case!

clundquist-stripe commented 4 years ago

Working from @tan-stripe 's example and updating it to Envoy 1.16 I still see the reported behavior.

➜ envoy_repro ./server                                       
2020/10/26 11:31:50 http2: server connection error from 127.0.0.1:61438: connection error: PROTOCOL_ERROR
2020/10/26 11:31:50 http2: server connection error from 127.0.0.1:61438: connection error: PROTOCOL_ERROR
2020/10/26 11:31:50 http2: server connection error from 127.0.0.1:61438: connection error: PROTOCOL_ERROR
# ...

I did have to:

mattklein123 commented 4 years ago

@clundquist-stripe can you put together a self-contained docker compose based repro using current main branch and I will take a look? Thank you.

kryzthov-stripe commented 4 years ago

This nghttp2 change might be relevant: https://github.com/nghttp2/nghttp2/pull/1477

parkedwards commented 3 years ago

This nghttp2 change might be relevant: nghttp2/nghttp2#1477

Since we haven't been able to create a self-container docker-compose repro, anyone know if the quoted change by @tatsuhiro-t in nghttp2 has been folded into an Envoy release?

clundquist-stripe commented 3 years ago

Since we haven't been able to create a self-container docker-compose repro,

Tan's example in the initial post worked for me when I reproduced it. I may have done a few trivial changes around envoy versioning and golang versioning.

HTTP2 Server - https://gist.github.com/tan-stripe/4ea1e8f3dfbb3af6210f655dc2883647 envoy.yaml - https://gist.github.com/tan-stripe/8965f551d7ffe68cdb4c4d738d143a63 Dockerfile - https://gist.github.com/tan-stripe/d6bfb04c5ad8c044442b0aefa07d898b

antoniovicente commented 3 years ago

It would be good to know if this still happens. I know of one case where Envoy can send RST_STREAM for a stream where it hasn't sent HEADERS but I think it can only happen before this change: https://github.com/envoyproxy/envoy/pull/14820

parkedwards commented 3 years ago

It would be good to know if this still happens. I know of one case where Envoy can send RST_STREAM for a stream where it hasn't sent HEADERS but I think it can only happen before this change: #14820

Possibly a silly question from me, but I'd love to see if @antoniovicente's quoted change does mitigate this issue -- how might I find which Envoy release incorporates this so I can repro re: @clundquist-stripe 's notes?

tan-stripe commented 3 years ago

Here's a docker-compose version of the envoy example:

envoy_example.zip

If you run this along with something like hey or vegeta, you'll reproduce the problem in Envoy version 1.17 and below. It seems to have been fixed in 1.18.

Sample Hey Command:

hey -n 50 -c 50 -h2 -t 3 http://localhost:10000

Sample Vegeta Command:

echo "GET http://localhost:10000" | vegeta attack -duration=15s  | tee results.bin | vegeta report
venilnoronha commented 3 years ago

I had a chance to test this with Envoy v1.18.3, and the bug doesn't seem to be reproducible (at least with the test described in this issue).

clundquist-stripe commented 3 years ago

Yep! That confirms our findings!

It seems to have been fixed in 1.18.