Closed kemiller closed 4 years ago
Note this is for calls that happen relatively intermittently. The solution so far has just been to disable pooling for these connections, but it's been whack-a-mole keeping up with them.
I'm also seeing this issue here, extremely frequently.
@KevinConti I am trying out a fix: https://github.com/cryptagon/mojito
Would you test it and see if it fixes your errors?
My changes above do seem to fix this. It may be a little while before I can write a proper test since it would involve hacking the internal test HTTP server with HTTP2 protocol negotiation, so I'm not submitting a PR, but figured I'd write up the resolution in case anyone lands here from google...
Thanks Ken, I'll try this out and get back to you.
Hey team, getting this too. I'm able to reproduce reliably.
App stats:
It would appear I'm only able to reproduce on Amazon EC2 (ECS - AMI 2018.03.20200115). It's happened for calls to a few different endpoints.
It seems to have a "warming-up" characteristic. I booted up iex
on an Amazon ECS EC2 instance then ran the following - which is just a request to a static asset:
iex(...)1> 1..100 |> Enum.map(fn _ ->
> {code, res} = Mojito.request(method: :get, url: "https://acco.io/mojito-test.json")
> IO.inspect({code, res})
> Process.sleep(1_000)
> code
> end)
{:error,
%Mojito.Error{
message: nil,
reason: %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing}
}}
{:error,
%Mojito.Error{
message: nil,
reason: %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing}
}}
{:error,
%Mojito.Error{
message: nil,
reason: %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing}
}}
{:error,
%Mojito.Error{
message: nil,
reason: %Mint.HTTPError{module: Mint.HTTP2, reason: :closed_for_writing}
}}
{:ok,
%Mojito.Response{
body: "{ \"ok\": true }\n",
complete: true,
headers: [
{"date", "Mon, 28 Sep 2020 02:17:30 GMT"},
{"content-type", "application/json; charset=utf-8"},
{"content-disposition", "inline; filename=\"mojito-test.json\""},
{"cache-control", "public, max-age=0, must-revalidate"},
{"content-length", "15"},
{"access-control-allow-origin", "*"},
{"etag",
"W/\"e372a72fc5701affb4ca8709e824db8387d1077e2f13e97914005eace1297795\""},
{"accept-ranges", "bytes"},
{"x-vercel-cache", "HIT"},
{"age", "831"},
{"server", "Vercel"},
{"x-vercel-id", "pdx1::44hp8-1601259450518-bbd1f60561c3"},
{"strict-transport-security", "max-age=63072000"}
],
size: 15,
status_code: 200
}}
Note that once it "warmed up" the error frequency dropped significantly.
If I disconnect from the iex session/box, then log back in a minute or so later, I'm able to get a failure by the 4th or 5th request.
What's interesting is that I cannot seem to reproduce on my pc (latest macOS).
I haven't tried too extensively, but I have not been able to reproduce in plain old curl either.
Any thoughts? Other tests I can run?
@kemiller I'm weary of that solution, just because I'm not sure if closed_for_writing
means the request completely failed to reach destination? Don't want to retry a request if there's any chance the packets may have reached (calling non-idempotent endpoints)
@kemiller @kevinconti - are you seeing this on AWS too? If so, curious what your network setup looks like?
Only notable one for me is that we're using a NAT gateway for outbound requests (so we have discrete IPs for our servers)
Thanks for taking a look, @acco!
We are on Google Cloud Platform and saw it there.
The reason it seemed safe to me to handle it that way is that the only place in the Mint code that ever sends :closed_for_writing
is a pattern match on the connection state in the request()
function itself. I.e. it will only ever give you that if the request is already in a goaway state when it was returned from the pool. I.e. it was put back into the pool in that state, not that you tried your request and it got to that state. But I could be reading it wrong. Why are connections being put back into the pool in that condition? I don't know. But apparently fully-closed connections are also being put back in because the :closed
error reason is also generated that way.
Edit: the relevant line in Mint code: https://github.com/elixir-mint/mint/blob/a4f2080cdfba19ee886199a39389e9913280263f/lib/mint/http2.ex#L476
Thanks for the note @kemiller - makes sense
@kemiller @KevinConti I only just started testing, but I'm having luck with this recent fix!
https://github.com/elixir-mint/mint/pull/280
Give it a shot:
{:mint, git: "https://github.com/elixir-mint/mint.git"}
Update Oct 1: Still looking good for me
@acco the latest mint seems to be working for us as well.
I'm going to close this since it doesn't appear to be a Mojito issue, and there's a resolution.
It looks as if either half-closed connections are being returned to the pool when they shouldn't, or :closed_for_writing connections should use the same retry logic that just :closed connections get. I'm happy to take a stab at a PR for this, but could use a little guidance.