Getting a lot of %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing}

kemiller commented 4 years ago

It looks as if either half-closed connections are being returned to the pool when they shouldn't, or :closed_for_writing connections should use the same retry logic that just :closed connections get. I'm happy to take a stab at a PR for this, but could use a little guidance.

kemiller commented 4 years ago

Note this is for calls that happen relatively intermittently. The solution so far has just been to disable pooling for these connections, but it's been whack-a-mole keeping up with them.

KevinConti commented 4 years ago

I'm also seeing this issue here, extremely frequently.

kemiller commented 4 years ago

@KevinConti I am trying out a fix: https://github.com/cryptagon/mojito

Would you test it and see if it fixes your errors?

kemiller commented 4 years ago

My changes above do seem to fix this. It may be a little while before I can write a proper test since it would involve hacking the internal test HTTP server with HTTP2 protocol negotiation, so I'm not submitting a PR, but figured I'd write up the resolution in case anyone lands here from google...

KevinConti commented 4 years ago

Thanks Ken, I'll try this out and get back to you.

acco commented 4 years ago

Hey team, getting this too. I'm able to reproduce reliably.

App stats:

Elixir 1.10.3
Erlang/OTP 23 [erts-11.0.2]
Mojito 0.7.3

It would appear I'm only able to reproduce on Amazon EC2 (ECS - AMI 2018.03.20200115). It's happened for calls to a few different endpoints.

It seems to have a "warming-up" characteristic. I booted up iex on an Amazon ECS EC2 instance then ran the following - which is just a request to a static asset:

iex(...)1> 1..100 |> Enum.map(fn _ ->
>  {code, res} = Mojito.request(method: :get, url: "https://acco.io/mojito-test.json")
>  IO.inspect({code, res})
>  Process.sleep(1_000)
>  code
>  end)
{:error,
 %Mojito.Error{
   message: nil,
   reason: %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing}
 }}
{:error,
 %Mojito.Error{
   message: nil,
   reason: %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing}
 }}
{:error,
 %Mojito.Error{
   message: nil,
   reason: %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing}
 }}
{:error,
 %Mojito.Error{
   message: nil,
   reason: %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing}
 }}
{:ok,
 %Mojito.Response{
   body: "{ \"ok\": true }\n",
   complete: true,
   headers: [
     {"date", "Mon, 28 Sep 2020 02:17:30 GMT"},
     {"content-type", "application/json; charset=utf-8"},
     {"content-disposition", "inline; filename=\"mojito-test.json\""},
     {"cache-control", "public, max-age=0, must-revalidate"},
     {"content-length", "15"},
     {"access-control-allow-origin", "*"},
     {"etag",
      "W/\"e372a72fc5701affb4ca8709e824db8387d1077e2f13e97914005eace1297795\""},
     {"accept-ranges", "bytes"},
     {"x-vercel-cache", "HIT"},
     {"age", "831"},
     {"server", "Vercel"},
     {"x-vercel-id", "pdx1::44hp8-1601259450518-bbd1f60561c3"},
     {"strict-transport-security", "max-age=63072000"}
   ],
   size: 15,
   status_code: 200
 }}

Note that once it "warmed up" the error frequency dropped significantly.

If I disconnect from the iex session/box, then log back in a minute or so later, I'm able to get a failure by the 4th or 5th request.

What's interesting is that I cannot seem to reproduce on my pc (latest macOS).

I haven't tried too extensively, but I have not been able to reproduce in plain old curl either.

Any thoughts? Other tests I can run?

acco commented 4 years ago

@kemiller I'm weary of that solution, just because I'm not sure if closed_for_writing means the request completely failed to reach destination? Don't want to retry a request if there's any chance the packets may have reached (calling non-idempotent endpoints)

@kemiller @kevinconti - are you seeing this on AWS too? If so, curious what your network setup looks like?

Only notable one for me is that we're using a NAT gateway for outbound requests (so we have discrete IPs for our servers)

kemiller commented 4 years ago

Thanks for taking a look, @acco!

We are on Google Cloud Platform and saw it there.

The reason it seemed safe to me to handle it that way is that the only place in the Mint code that ever sends :closed_for_writing is a pattern match on the connection state in the request() function itself. I.e. it will only ever give you that if the request is already in a goaway state when it was returned from the pool. I.e. it was put back into the pool in that state, not that you tried your request and it got to that state. But I could be reading it wrong. Why are connections being put back into the pool in that condition? I don't know. But apparently fully-closed connections are also being put back in because the :closed error reason is also generated that way.

Edit: the relevant line in Mint code: https://github.com/elixir-mint/mint/blob/a4f2080cdfba19ee886199a39389e9913280263f/lib/mint/http2.ex#L476

acco commented 4 years ago

Thanks for the note @kemiller - makes sense

acco commented 4 years ago

@kemiller @KevinConti I only just started testing, but I'm having luck with this recent fix!

https://github.com/elixir-mint/mint/pull/280

Give it a shot:

{:mint, git: "https://github.com/elixir-mint/mint.git"}

Update Oct 1: Still looking good for me

kemiller commented 4 years ago

@acco the latest mint seems to be working for us as well.

kemiller commented 4 years ago

I'm going to close this since it doesn't appear to be a Mojito issue, and there's a resolution.

appcues / mojito

Getting a lot of %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing} #70