elixir-mint / mint

Functional HTTP client for Elixir with support for HTTP/1 and HTTP/2 šŸŒ±
Apache License 2.0
1.36k stars 111 forks source link

Mint.HTTP2 streams stuck in `:half_closed_local` state #450

Open daisyzhou opened 2 weeks ago

daisyzhou commented 2 weeks ago

Unfortunately we do not have a repro or hypothesis available, but wanted to get your help either fixing the problem in Mint or advising us on how to avoid/handle it.

We are running Mint 1.6.0, and noticed a number of our Mint.HTTP2 connections with max_concurrent_streams streams, all stuck with state: :half_closed_local. This rendered the connection unusable as it wasn't able to handle new requests.

Is this a known failure mode? Can you help figure out what the cause is?

whatyouhide commented 1 week ago

No, this is not a known failure mode and could definitely be a bug.

See the diagram in https://datatracker.ietf.org/doc/html/rfc9113#section-5.1:

CleanShot 2024-09-19 at 13 50 42@2x

We get to the :half_closed_local state in two cases:

  1. We start the stream and eventually send END_STREAM (when we finished the request usually).
  2. The server starts a stream (with a PUSH_PROMISE) and finishes sending headers.

Then, the RFC says:

A stream that is in the "half-closed (local)" state cannot be used for sending frames other than WINDOW_UPDATE, PRIORITY, and RST_STREAM.

A stream transitions from this state to "closed" when a frame is received with the END_STREAM flag set or when either peer sends a RST_STREAM frame.

An endpoint can receive any type of frame in this state. Providing flow-control credit using WINDOW_UPDATE frames is necessary to continue receiving flow-controlled frames. In this state, a receiver can ignore WINDOW_UPDATE frames, which might arrive for a short period after a frame with the END_STREAM flag set is sent.

So, Iā€™m not really sure what's happening, haven't looked at the code for a while. However, it could be that

  1. we send a full request to the server,
  2. we put the stream in :half_closed_local
  3. the server never replies with a frame with the END_STREAM flag set.

I don't think we can send RST_STREAM ourselves here because we might not have gotten a response from the server yet. If you have time to investigate this further with the help of this data, I'd be very grateful, otherwise I'll try to take a closer look at this soon šŸ™ƒ

daisyzhou commented 1 week ago

Hi @whatyouhide ,

Thanks for the context. Unfortunately I can't look into it much further since we don't have a repro either, and it only happened one time (albeit to multiple connections).

If it is indeed that the server never replies with a frame with the END_STREAM flag set, is the connection just broken forever? We saw this happen to all streams in the connections at the same time (for a few different connections), so maybe it was a networking blip that dropped the END_STREAM. If we get into this state, would you suggest just killing and restarting the connection?