"Unexpected message" error log and then requests stopped

losvedir commented 3 years ago

Last night, one of our applications stopped making all HTTP requests. I looked at the logs locally, and this was the last message:

21:49:55.070 [error] Unexpected message: {:ssl_closed, {:sslsocket, {:gen_tcp, #Port<0.7>, :tls_connection, :undefined}, [#PID<0.1185.0>, #PID<0.1184.0>]}}

I grepped our codebase and all its dependencies, and that log message is only generated by hackney, in hackney_stream.erl.

One caveat is that I don't know for sure if the app stopped running altogether, or just stopped making HTTP requests. We log our info level messages to Splunk (via HTTP and hackney), and so those logs stopped coming in. Locally, we only log warn and above, and those are infrequent enough that the above log message was the last one that appeared. However, I believe the app was continuing to work, and just failing to send requests, because if it had outright crashed it would have been detected and the application would have been automatically restarted.

After a manual restart of the application everything worked normally again.

Could it be that hackney wedged itself upon receiving that message, and stopped sending requests? Or did hackney behave fine, and the calling code didn't handle a return value it should have?

Versions

hackney 1.17.0
erlang/OTP 22.3

losvedir commented 3 years ago

Just had another thought. Since we log via HTTP and so lost those logs, it makes it difficult to know what was going on exactly. For example, we don't know if GET requests were succeeding or not. However, the application POSTs to three different destinations, using two pools, and we know that after this error message, none of the destinations received anymore POST messages.

Edit: Actually, I just thought of a server that we control with sufficiently granular logs and was able to confirm that the GET requests to that server also stopped.

benoitc commented 3 years ago

Yes, it's a message not vatched by hackney (though it should be in last version. There is a new connection handling that will be part of the next update fixing it completely. the work is in progress :)

benoitc commented 3 years ago

@losvedir can you share a snippet of your code doing the request and where you get that error?

losvedir commented 3 years ago

That error message specifically came from one of the three places in hackney's own hackney_stream.erl. I grepped my codebase and all its deps and that's the only place "Unexpected message" occurs.

We don't actually use hackney in our code directly. The two libraries we use that use :hackney are ExAws and HTTPoison. I believe this is ExAws's usage and this is HTTPoison's.

I also wonder, given the error of :ssl_closed if it wasn't related to a specific request, but just a message that came in between requests. For example, I know that we have lots of clauses like this sprinkled in a lot of our GenServers:

# Saw this message some in from dev server. Handle it more gracefully.
def handle_info({:ssl_closed, socket}, state) do
  Logger.info("SSL Connection closed. Ignoring. Socket #{inspect(socket)}")
  {:noreply, state}
end

benoitc / hackney

"Unexpected message" error log and then requests stopped #677