Open tan-stripe opened 4 years ago
Sounds like we should be tracking whether we've sent anything over the streams (either in the deferred reset code, in ConnectionImpl::StreamImpl::resetStreamWorker
or in nghttp2 proper) and conditionally issue a RST_STREAM based on that.
WDYT @mattklein123 @alyssawilk
Hmm, this is an interesting one. This is a bit deeper in the spec than I am familiar with. I don't think it's as simple as tracking the state, as I think the only way we can get into this situation is that we have already submitted headers, but not data. What I think is happening is that nghttp2 is dropping the headers frame when it sees the reset frame. I think it's possible this would need to be fixed in nghttp2, but it's also possible we are doing something wrong here.
cc @PiotrSikora @antoniovicente @tatsuhiro-t
nghttp2 cancels pending HEADERS frame if it sees RST_STREAM referencing it. In this case, RST_STREAM itself is not sent in this case.
It is true that nghttp2 does not do any effort not to send RST_STREAM to idle stream unless the above code finds those HEADERS. If the above code works as it is intended, and envoy only submits RST_STREAM after it submits HEADERS, RST_STREAM will not be sent as far as I understand it. I can change the code so that RST_STREAM is not sent to the idle stream in the future nghttp2 release.
If the above code works as it is intended, and envoy only submits RST_STREAM after it submits HEADERS, RST_STREAM will not be sent as far as I understand it.
I will review the code again but I'm pretty sure in the deferred reset case we only ever submit reset after submitting headers. It's possible there is some sequence we are missing here. I will try to audit the code unless @alyssawilk gets to it first.
I am also facing this issue with istio-proxy within our K8s cluster. Envoy version : 1.12.0
The server container sends a GOAWAY to istio-proxy on receiving RST_STREAM for idle stream. We use libmicrohttpd for HTTP server.
Additional Debug Data contents : RST_STREAM: stream in idle
All the pending requests on this connection are responded with 503 UC. Strangely this is not seen when the incoming traffic rate is high. After the rate reduces, this is seen. Seems like there is an issue when streams are getting terminated.
tl;dr at first glance I don't see a way we can create http2 streams without immediately sending headers to nghttp2
Looking at Envoy code in the happy path:
UpstreamRequest::encodeHeaders calls HttpConnPool::newStream calls ConnPoolImplBase::newStream. If there is a connection ready, ConnPoolImplBase::attachRequestToClient calls --ConnPoolImpl::ActiveClient::newStreamEncoder calls ClientConnectionImpl::newStream to create the nghttp2 stream via nghttp2_session_client_new2 -- HttpConnPool::onPoolReady calls UpstreamRequest::onPoolReady calls HttpUpstream::encodeHeaders sends the headers via nghttp2_submit_request
My suspicition was there was a bug if we don't have a prefetched connection. In that case we defer attachRequestToClient. Once the connection is ready, ConnPoolImplBase::onUpstreamReady calls ConnPoolImplBase::attachRequestToClient but attachRequestToClient doesn't call newStreamEncoder unless there are pending requests.
it looks like we don't do newStreamEncoder (creating the nghttp2 stream) unless we have headers ready to ship, and then we always call onPoolReady which always calls encodeHeaders - I don't see any early returns on this path.
note we don't track if the headers have been sent via onFrameSend, but we do sendPendingFrames, which does nghttp2_session_send. I don't see where pending_deferredreset can happen before the headers have been encoded (though maybe Matt or trace logs will find I'm missing something)
tl;dr at first glance I don't see a way we can create http2 streams without immediately sending headers to nghttp2
Yeah this was my assessment also but I will look again this week.
Just wanted to check on this issue as it has been a while. We're eager for this fix, and want to know if we should explore some other work arounds if it won't make it in 1.17
We're eager for this fix, and want to know if we should explore some other work arounds if it won't make it in 1.17
Unfortunately from code inspection we have no idea how this can happen. Without a repro it's going to be very hard to fix this. Are you able to easily repro this? Can you come up with a self-contained docker compose repro? If not, is it easy enough to repro that you could capture trace logs of a repro?
Oh, are the repro steps from Tan's post not working? I can dig into it if that's the case!
Working from @tan-stripe 's example and updating it to Envoy 1.16 I still see the reported behavior.
➜ envoy_repro ./server
2020/10/26 11:31:50 http2: server connection error from 127.0.0.1:61438: connection error: PROTOCOL_ERROR
2020/10/26 11:31:50 http2: server connection error from 127.0.0.1:61438: connection error: PROTOCOL_ERROR
2020/10/26 11:31:50 http2: server connection error from 127.0.0.1:61438: connection error: PROTOCOL_ERROR
# ...
I did have to:
go get golang.org/x/net/http2/h2c
for server.go
to build which isn't too surprisingdocker run -p 10000:10000/tcp -it <sha>
to get the port forwarding to work locally@clundquist-stripe can you put together a self-contained docker compose based repro using current main branch and I will take a look? Thank you.
This nghttp2 change might be relevant: https://github.com/nghttp2/nghttp2/pull/1477
This nghttp2 change might be relevant: nghttp2/nghttp2#1477
Since we haven't been able to create a self-container docker-compose repro, anyone know if the quoted change by @tatsuhiro-t in nghttp2 has been folded into an Envoy release?
Since we haven't been able to create a self-container docker-compose repro,
Tan's example in the initial post worked for me when I reproduced it. I may have done a few trivial changes around envoy versioning and golang versioning.
HTTP2 Server - https://gist.github.com/tan-stripe/4ea1e8f3dfbb3af6210f655dc2883647 envoy.yaml - https://gist.github.com/tan-stripe/8965f551d7ffe68cdb4c4d738d143a63 Dockerfile - https://gist.github.com/tan-stripe/d6bfb04c5ad8c044442b0aefa07d898b
It would be good to know if this still happens. I know of one case where Envoy can send RST_STREAM for a stream where it hasn't sent HEADERS but I think it can only happen before this change: https://github.com/envoyproxy/envoy/pull/14820
It would be good to know if this still happens. I know of one case where Envoy can send RST_STREAM for a stream where it hasn't sent HEADERS but I think it can only happen before this change: #14820
Possibly a silly question from me, but I'd love to see if @antoniovicente's quoted change does mitigate this issue -- how might I find which Envoy release incorporates this so I can repro re: @clundquist-stripe 's notes?
Here's a docker-compose version of the envoy example:
If you run this along with something like hey
or vegeta
, you'll reproduce the problem in Envoy version 1.17 and below. It seems to have been fixed in 1.18.
Sample Hey Command:
hey -n 50 -c 50 -h2 -t 3 http://localhost:10000
Sample Vegeta Command:
echo "GET http://localhost:10000" | vegeta attack -duration=15s | tee results.bin | vegeta report
I had a chance to test this with Envoy v1.18.3, and the bug doesn't seem to be reproducible (at least with the test described in this issue).
Yep! That confirms our findings!
It seems to have been fixed in 1.18.
Description
Preface: we’re using envoy as an L7 proxy with a HTTP2 upstream. We’ve noticed that sometimes Envoy sometimes sends a RST_STREAM before actually sending the HEADERS frame to change the stream state from idle to open.
The following is a pcap of a single TCP connection between envoy and the upstream server. The upstream server is listening on 8080 while the ephemeral port that envoy cx is using is 63692.
Pcap file
On
#243
, we can see that envoy starts sending RST_STREAM for streams that haven’t received HEADERS or PUSH-PROMISE frames. Additionally, some of these stream resets are repeated (e.g. stream id 27, 23 from above). Because of this the server closes the connection with a PROTOCOL_ERROR as per HTTP2 spec.Excerpt:
This leads to envoy dropping in-flight requests on the same HTTP2 connection with a 503 UC. We believe this is because of the way we do deferred reset after reading this code snippet. In other words, we believe that from envoy’s prospective, this stream exists, but since the first frame hasn’t made it to the server yet, the server believes that the stream is idle and does an active close.
Repro Steps
A very contrived example can be done using a single upstream go service coupled with a load tester and envoy.
Version details:
Configuration:
Steps:
hey -n 50 -c 50 -h2 -t 3 http://localhost:10000