[Proposal] Add a DisconnectingHandler

pshirali commented 4 years ago

In Client.internalConnLost, when a disconnection takes place, the worker goroutines are waited on till they get done. This is followed by logic for AutoReconnect and OnConnectionLost. The opportunity for an application to handle the actual disconnection takes place only after the workers are waited upon.

While waiting, the application may continue to publish. The client.Publish calls don't terminate early as the connection status has not yet changed to reconnecting or disconnected. Further, Client.IsConnected() is influenced by the AutoReconnect and ConnectRetry options, where a client.Publish is not prevented during reconnections attempted by the client.

It is valuable for an application to have an opportunity to take action immediately, when a disconnection is detected, even while the mqtt client is still shutting down its workers. Applications which would want to prevent publishing (or do something more) when a disconnection is detected, would desire to know of the disconnection at the earliest possible moment of it occurring.

Proposed Enhancement [1]

Introduce type DisconnectingHandler func(Client, error) handler, with a SetDisconnectingHandler(..) option in ClientOptions. This handler will be invoked as a goroutine before the c.closeStop() in Client.internalConnLost in client.go

Additional Enhancement [2] --- (but, is a deviation from current behavior)

The client could have a disconnecting status.
A c.setConnected(disconnecting) will set the disconnecting status just before where the DisconnectingHandler would be called (and much before waiting for workers, AutoReconnect and OnConnectionLost)
For non-AutoReconnect clients, the disconnecting status could cause client.IsConnected() to return false, thus preventing client.Publish(..).

These changes would prevent a client.Publish(..) at the earliest possible instant when a client gets disconnected. Unfortunately, this will change behavior for all non-AutoReconnect clients, where client.Publish will begin to fail slightly earlier than what they are used to seeing.

At the moment I'm proposing only [1], as [2] may require some discussion, for possible future consideration.

I'll be happy to work on this and submit a PR.

MattBrittan commented 4 years ago

PR #381 has now been accepted and this changes things a bit (quite a bit of the code you mention has been rewritten). Publish() will still accept requests while stopCommsWorkers() is running. However in the event of a connection loss I would expect stopCommsWorkers() to return very quickly (the connection is closed so network functions should return an error pretty much immediatly).

It would be possible to call a Disconnecting handler in between the two checks in if status != disconnected && c.stopCommsWorkers() { (here) or perhaps at the top on stopCommsWorkers() (probably more consistent because the handler should really be called whatever the cause of the disconnection).

I suspect that the time between the proposed DisconnectingHandler executing (noting that it is started as a goroutine) and stopCommsWorkers() completing will be very small (have not benchmarked this but doing so might be useful). So the question comes down to whether the additional complexity (mainly in your code) is worth it for the small chance that you will prevent a call to Publish (or whatever other actions desired) in the small window while stopCommsWorkers() is running.

I guess another option would be to move the call to c.setConnected(disconnected) (or reconnecting) above the call to c.stopCommsWorkers() (have not looked in detail but think that would work OK).

A few other thoughts:

Your requirement for a DisconnectingHandler only seems to apply when AutoReconnect is disabled? (when enabled Publish() will not return an error on connection loss at all).
If reconnection and persistant storage is enabled then, in most cases, the fact the connection has been lost will have no impact on the user (because it's fire-and-forget). Even without persistant storage the loss of a message would be rare (would happen if the connection went down and then after publishing a few messages the client was stopped).
If reconnection is not enabled and CleanSession is enabled then the tokens for any messages published but not completed will return an error once the disconnection is complete.

pshirali commented 4 years ago

@MattBrittan Thanks a lot for the detailed response. Yes, I do see that a lot of code has now changed. :)

To clarify my proposal included two parts:

Provide a DisconnectingHandler (+other API) which gives the application an opportunity to implement anything it wishes on the first knowledge of the client disconnecting from the broker. This is independent of preventing a Publish, but if a client wishes to implement some logic which prevents a publish, they could.
The client in paho.mqtt.golang setting its internal state to something !connected on the first knowledge of a disconnection, before workers are stopped. This would be the client's logic to prevent a Publish when !connected

Yes, [1] will add to the API and maintenance. But it gives the application flexibility to implement something that does more than just preventing Publish.

Example: The client internally preventing a Publish would mean prevention at a per-message/token level. A trigger point to invoke action impacting the client's behavior would take place when the OnConnectionLost handler is called. If a DisconnectingHandler existed, the application could perhaps completely avoid investing resources (spinning application goroutines etc) in preparing the message for publishing in the first place, and auditing its success/failure. This may of course vary application-to-application.

I accept your point that it may not be worth adding new API if the wind-down duration will always be very short.

The other option you mentioned is inline with my thoughts on Additional Enhancement [2]. Calling c.setConnected(..) before c.stopCommsWorkers() will prevent Publish even before worker teardown. This would at least ensure that a Publish will not contribute to a potential deadlock (publish, slipping through disconnection race, staying on outbound channel, while some workers have close, while one remains alive and blocked)

For a non-AutoConnecting client, any teardown deadlock will prevent the OnConnectionLost handler from being called, thus the application would never reconnect. More importantly, it'd never get to know that it will not get an opportunity to reconnect. The client.IsConnected() status will report true, but, Publish will fail because the connection has actually been lost.

The ideal state here is a short teardown duration for the workers, and a guarantee that the teardown will always take place without any deadlocks. If setting c.setConnected(..) before c.stopCommsWorkers() contributes towards preventing this, then that's definitely valuable.

pshirali commented 4 years ago

Reg: a few other thoughts that you mentioned.

Yes, I proposed calling DisconnectingHandler only for non-AutoReconnect. Honestly it could be called irrespective, but the most likely usage would be for non-AutoReconnect clients.

I've not seen many implementations of out-of-bound Reconnect and auto-Reconnect, but my assumption on the implementation pattern is:

[a] out-of-bound Reconnect clients would go through a full disconnection and reconnection, triggered by logic residing in the application's OnConnectionLost handler. The application has a large role to play.

[b] AutoReconnect clients would go through a connected->reconnecting->connected cycle, where published msgs continue to get handled by the client while it attempts auto-reconnect. An attempt to call OnConnectionLost hander is made here as well, but I'm assuming the implementation in AutoReconnect clients will be very different from that of [a]

[sidenote] Yes :) You are correct in saying that an AutoReconnect client, with a persistent Store and CleanSession disabled could be a recipe for not losing (or rarely) losing messages. However high-throughput applications may benefit from an out-of-bound Reconnect client and out-of-bound message persistence (thus escaping mqtt limits of 64k inflight msgs, CleanSession status etc).

Thanks again :)

KShih commented 4 years ago

not sure if this issue is related to yours 🤔

MattBrittan commented 3 years ago

@pshirali apologies for not replying - I had not noticed that you had posted further on this issue. I can see the benefits of having a DisconnectingHandler and see little downside (as it will not be called unless enabled). If you submit a PR I now have the ability to accept it.

pshirali commented 3 years ago

Thanks @MattBrittan. Unfortunately, it may be some more time before I begin to implement this. But I will take this up and submit a PR. Thanks again.

MattBrittan commented 2 years ago

If you are still thinking about implementing this please see tins comment (re a generic connection status notification function)

eclipse / paho.mqtt.golang

[Proposal] Add a DisconnectingHandler #410

Proposed Enhancement [1]

Additional Enhancement [2] --- (but, is a deviation from current behavior)