When GRPC.Stub.connect/2 fails with timeout, gun still retrying?

ryanwinchester commented 3 years ago

In order to not have my application crash during startup of the gRPC server we want to connect to is unavailable, I moved the connection of our gRPC client to the handle_continue callback. And then I just keep retrying every 5 seconds for now.

Example:

def init({_host, _opts} = state) do
  {:ok, state, {:continue, :connect}}
end

def handle_continue(:connect, {host, opts} = state) do
  case GRPC.Stub.connect(host, opts) do
    {:ok, channel} ->
      state = %__MODULE__{
        channel: channel,
        status: :connected,
        host: host,
        opts: opts
      }

      {:noreply, state}

    {:error, error} ->
      Process.sleep(5_000)
      {:noreply, state, {:continue, :connect}}
  end
end

This seems like it should work, and it does if the server is up. However, if the server is not up, it will attempt the connection every five seconds as expected, and then connects.

However, I then also receive a :gun_up message for every single attempt we made in handle_continue until we were successful. So, if we attempted to connect 50 times, and the 51st time was successful, we will receive 50 :gun_up messages afterwards.

Conclusion

I think this means that even though we get the :error tuple with a timeout message, the gun_adapter is also setting up retries FOR EACH ATTEMPT that I made previously.

I don't want this because it could create something like a thundering herd from a single client to the server, as well as there is no good way to transform that :gun_up message into a %GRPC.Channel{} if I wanted to just rely on the gun_adapter's retry.

Does anybody know how I can resolve this in a good way?

polvalente commented 2 years ago

Is this still relevant?

ryanwinchester commented 2 years ago

Is this still relevant?

It is if the fix from https://github.com/brexhq/grpc-elixir/pull/10 was never applied upstream (to this repo)

ryanwinchester commented 2 years ago

Looks like it was: https://github.com/elixir-grpc/grpc/pull/152

elixir-grpc / grpc

When GRPC.Stub.connect/2 fails with timeout, gun still retrying? #190

Example:

Conclusion