J0 / phoenix_gen_socket_client

Socket client behaviour for phoenix channels
MIT License
232 stars 48 forks source link

Fix reconnecting a socket that is already running to new url/params #37

Closed boydm closed 5 years ago

boydm commented 5 years ago

Ran into an issue when updating the connection an already running transport layer. Mainly, the connect/1 function failed to match if the incoming transport_pid was not nil (if it’s already running). This is important in the case where the client needs to migrate to a new server or update it's connection params.

First I added a second connect/1 function that resets the state, then calls connect again, with a nil transport_pid. This solves reconnecting with an already running transport pid.

Second, I modified reinit/1 so that if the transport_pid is set, it exits that transport process. This prevents process leakage as the connection is reset.

Third, I added a test case that would have caught this situation and now it passes.

Finally, I ran the Elixir formatter (v1.7), which touched quite a few lines in .credo.exs. Nothing functional changed.

I did not need to change any docs, as it didn’t add any functionality. Just fixed a bug.

Note: the docs fail to build under Elixir 1.7, but that is separate issue. You should update ex_doc to v0.19 or so.

boydm commented 5 years ago

OK. Here is my real question. Is there a better way to close the socket other than just killing the process? If you want to wait on this, I'll look into it more tomorrow. Getting tired.

sasa1977 commented 5 years ago

The reconnect scenario is somewhat tricky, and I'm currently not sure if we should support it in the proposed form.

One problem is that in reconnect, you might still receive some leftover messages from the previous server. Consider the scenario where the parent process (GenSocketClient) tries to reconnect while the transport has pushed a few messages. At this point, we'll disconnect, and then still receive the pushed messages from the old client.

Another problem is that with your solution, you will always reconnect, even if parameters and url remain the same. This can lead to some unexpected disconnects, and in general to a behaviour which is harder to reason about.

I currently see two alternatives:

  1. Push the complete responsibility to the client
  2. Support disconnect, and leave it to client to reconnect

In the first approach, when client wants to reconnect, it will terminate the GenSocketClient, and start another one. This is a simple approach which requires no changes in the library, but it has a downside that the complete state of the previous connection is lost.

The second approach would work as follows:

  1. In the callback, the client returns {:disconnect, state}.
  2. As a result, the behaviour synchronously terminates the transport, then sends itself a {:notify_disconnected, reason} cast. This will ensure that handle_disconnected/2 is invoked after all the pending messages have been processed.
  3. In handle_disconnected/2 the client can now provide different parameters.

To synchronously terminate the transport, we need to do something like this:

Process.exit(transport_pid, :shutdown)

receive do
  {:DOWN, ^transport_mref, :process, ^transport_pid, _reason} -> :ok
after :timer.seconds(5) ->
  Process.exit(transport_pid, :kill)
  receive do
    {:DOWN, ^transport_mref, :process, ^transport_pid, _reason} -> :ok
  end
end

And then the behaviour should wrap up with:

notify_disconnected(self(), :client_disconnect)
reinit(state)

For this to work, the behaviour should always trap exits. In this case, we also need to implement the terminate callback in the behaviour, and stop the transport from that function.

If you can make it work with the first approach, this is what I'd suggest, as it requires no intervention in this library. However, if you want to take the second option, I'm fine with it too, but in this case I recommend starting from scratch. I think that reinit/1 function shouldn't be changed, and I also think that we shouldn't allow returning {:connect, ...} if the connection is already established.

boydm commented 5 years ago

Thank you for really thinking about it.

Ideally I would like to see a :disconnect mechanism so as to not completely lose the state, then let the client library control the reconnect. The first option is an appealing hack though, so I'm going to look into it.

I'm going to close this pull request in favor of one of the two other options even though I don't know which one yet.