Open pshirali opened 4 years ago
PR #381 has now been accepted and this changes things a bit (quite a bit of the code you mention has been rewritten). Publish()
will still accept requests while stopCommsWorkers()
is running. However in the event of a connection loss I would expect stopCommsWorkers()
to return very quickly (the connection is closed so network functions should return an error pretty much immediatly).
It would be possible to call a Disconnecting
handler in between the two checks in if status != disconnected && c.stopCommsWorkers() {
(here) or perhaps at the top on stopCommsWorkers()
(probably more consistent because the handler should really be called whatever the cause of the disconnection).
I suspect that the time between the proposed DisconnectingHandler
executing (noting that it is started as a goroutine) and stopCommsWorkers()
completing will be very small (have not benchmarked this but doing so might be useful). So the question comes down to whether the additional complexity (mainly in your code) is worth it for the small chance that you will prevent a call to Publish
(or whatever other actions desired) in the small window while stopCommsWorkers()
is running.
I guess another option would be to move the call to c.setConnected(disconnected)
(or reconnecting
) above the call to c.stopCommsWorkers()
(have not looked in detail but think that would work OK).
A few other thoughts:
DisconnectingHandler
only seems to apply when AutoReconnect
is disabled? (when enabled Publish()
will not return an error on connection loss at all).@MattBrittan Thanks a lot for the detailed response. Yes, I do see that a lot of code has now changed. :)
To clarify my proposal included two parts:
DisconnectingHandler
(+other API) which gives the application an opportunity to implement anything it wishes on the first knowledge of the client disconnecting from the broker. This is independent of preventing a Publish
, but if a client wishes to implement some logic which prevents a publish, they could.paho.mqtt.golang
setting its internal state to something !connected
on the first knowledge of a disconnection, before workers are stopped. This would be the client's logic to prevent a Publish
when !connected
Yes, [1] will add to the API and maintenance. But it gives the application flexibility to implement something that does more than just preventing Publish
.
Example: The client internally preventing a Publish
would mean prevention at a per-message/token level. A trigger point to invoke action impacting the client's behavior would take place when the OnConnectionLost
handler is called. If a DisconnectingHandler
existed, the application could perhaps completely avoid investing resources (spinning application goroutines etc) in preparing the message for publishing in the first place, and auditing its success/failure. This may of course vary application-to-application.
I accept your point that it may not be worth adding new API if the wind-down duration will always be very short.
The other option you mentioned is inline with my thoughts on Additional Enhancement [2]
. Calling c.setConnected(..)
before c.stopCommsWorkers()
will prevent Publish
even before worker teardown. This would at least ensure that a Publish
will not contribute to a potential deadlock (publish, slipping through disconnection race, staying on outbound channel, while some workers have close, while one remains alive and blocked)
For a non-AutoConnecting client, any teardown deadlock will prevent the OnConnectionLost
handler from being called, thus the application would never reconnect. More importantly, it'd never get to know that it will not get an opportunity to reconnect. The client.IsConnected()
status will report true
, but, Publish
will fail because the connection has actually been lost.
The ideal state here is a short teardown duration for the workers, and a guarantee that the teardown will always take place without any deadlocks. If setting c.setConnected(..)
before c.stopCommsWorkers()
contributes towards preventing this, then that's definitely valuable.
Reg: a few other thoughts that you mentioned.
Yes, I proposed calling DisconnectingHandler
only for non-AutoReconnect. Honestly it could be called irrespective, but the most likely usage would be for non-AutoReconnect clients.
I've not seen many implementations of out-of-bound Reconnect and auto-Reconnect, but my assumption on the implementation pattern is:
[a] out-of-bound Reconnect clients would go through a full disconnection
and reconnection, triggered by logic residing in the application's OnConnectionLost
handler. The application has a large role to play.
[b] AutoReconnect clients would go through a connected->reconnecting->connected
cycle, where published msgs continue to get handled by the client while it attempts auto-reconnect. An attempt to call OnConnectionLost
hander is made here as well, but I'm assuming the implementation in AutoReconnect clients will be very different from that of [a]
[sidenote] Yes :) You are correct in saying that an AutoReconnect client, with a persistent Store
and CleanSession
disabled could be a recipe for not losing (or rarely) losing messages. However high-throughput applications may benefit from an out-of-bound Reconnect client and out-of-bound message persistence (thus escaping mqtt limits of 64k inflight msgs, CleanSession status etc).
Thanks again :)
not sure if this issue is related to yours 🤔
@pshirali apologies for not replying - I had not noticed that you had posted further on this issue. I can see the benefits of having a DisconnectingHandler
and see little downside (as it will not be called unless enabled). If you submit a PR I now have the ability to accept it.
Thanks @MattBrittan. Unfortunately, it may be some more time before I begin to implement this. But I will take this up and submit a PR. Thanks again.
If you are still thinking about implementing this please see tins comment (re a generic connection status notification function)
In
Client.internalConnLost
, when a disconnection takes place, the worker goroutines are waited on till they get done. This is followed by logic for AutoReconnect and OnConnectionLost. The opportunity for an application to handle the actual disconnection takes place only after the workers are waited upon.While waiting, the application may continue to publish. The
client.Publish
calls don't terminate early as the connection status has not yet changed toreconnecting
ordisconnected
. Further,Client.IsConnected()
is influenced by theAutoReconnect
andConnectRetry
options, where aclient.Publish
is not prevented during reconnections attempted by the client.It is valuable for an application to have an opportunity to take action immediately, when a disconnection is detected, even while the mqtt client is still shutting down its workers. Applications which would want to prevent publishing (or do something more) when a disconnection is detected, would desire to know of the disconnection at the earliest possible moment of it occurring.
Proposed Enhancement [1]
Introduce
type DisconnectingHandler func(Client, error)
handler, with aSetDisconnectingHandler(..)
option inClientOptions
. This handler will be invoked as a goroutine before thec.closeStop()
inClient.internalConnLost
inclient.go
Additional Enhancement [2] --- (but, is a deviation from current behavior)
disconnecting
status.c.setConnected(disconnecting)
will set thedisconnecting
status just before where theDisconnectingHandler
would be called (and much before waiting for workers, AutoReconnect and OnConnectionLost)disconnecting
status could causeclient.IsConnected()
to return false, thus preventingclient.Publish(..)
.These changes would prevent a
client.Publish(..)
at the earliest possible instant when a client gets disconnected. Unfortunately, this will change behavior for all non-AutoReconnect clients, whereclient.Publish
will begin to fail slightly earlier than what they are used to seeing.At the moment I'm proposing only
[1]
, as[2]
may require some discussion, for possible future consideration.I'll be happy to work on this and submit a PR.