Closed daisyzhou closed 2 weeks ago
No, this is not a known failure mode and could definitely be a bug.
See the diagram in https://datatracker.ietf.org/doc/html/rfc9113#section-5.1:
We get to the :half_closed_local
state in two cases:
END_STREAM
(when we finished the request usually).PUSH_PROMISE
) and finishes sending headers.Then, the RFC says:
A stream that is in the "half-closed (local)" state cannot be used for sending frames other than WINDOW_UPDATE, PRIORITY, and RST_STREAM.
A stream transitions from this state to "closed" when a frame is received with the END_STREAM flag set or when either peer sends a RST_STREAM frame.
An endpoint can receive any type of frame in this state. Providing flow-control credit using WINDOW_UPDATE frames is necessary to continue receiving flow-controlled frames. In this state, a receiver can ignore WINDOW_UPDATE frames, which might arrive for a short period after a frame with the END_STREAM flag set is sent.
So, Iām not really sure what's happening, haven't looked at the code for a while. However, it could be that
:half_closed_local
END_STREAM
flag set.I don't think we can send RST_STREAM
ourselves here because we might not have gotten a response from the server yet. If you have time to investigate this further with the help of this data, I'd be very grateful, otherwise I'll try to take a closer look at this soon š
Hi @whatyouhide ,
Thanks for the context. Unfortunately I can't look into it much further since we don't have a repro either, and it only happened one time (albeit to multiple connections).
If it is indeed that the server never replies with a frame with the END_STREAM flag set, is the connection just broken forever? We saw this happen to all streams in the connections at the same time (for a few different connections), so maybe it was a networking blip that dropped the END_STREAM. If we get into this state, would you suggest just killing and restarting the connection?
(I'm one of @daisyzhou's colleagues, hello!)
Just adding another data point that we saw this happen again, so it's definitely not just a one-off. Unfortunately I don't have any confirmation of the theory @whatyouhide set forth above, because we weren't able to packet capture the HTTP2 flow that broke it. Our "fix" has been to kill the process owning the connections. If you have any suggestions on how we could un-stick this connection, or requests for things we can capture if it happens again, please do let us know.
This may have ended up being a defect in our code. Our metrics indicate that these events were preceded by a small burst in timeouts, and it seems our timeout code did not call HTTP2.cancel_request/2
, which caused us to leak streams.
We'll report back if this recurs, but it may not be a Mint problem after all. Thanks for taking a look!
Oh, mh. Yeah that totally makes sense: we don't receive anything from the server and the request just stays open. We can't even really do timeouts within Mint because connections are stateless.
Okay, sounds good. Let's close this out and reopen it if this shows up again.
Unfortunately we do not have a repro or hypothesis available, but wanted to get your help either fixing the problem in Mint or advising us on how to avoid/handle it.
We are running Mint 1.6.0, and noticed a number of our Mint.HTTP2 connections with
max_concurrent_streams
streams, all stuck withstate: :half_closed_local
. This rendered the connection unusable as it wasn't able to handle new requests.Is this a known failure mode? Can you help figure out what the cause is?