Add back-off controller for sleep time of reconnection when connection lost is detected immediately after connecting. #589

tomatod commented 1 year ago

MaxConnectRetryInterval is added to ClientOptions related to #589.

The sleep time is increased progressively between connection attempts up to MaxConnectRetryInterval. If ConnectRetryInterval >= MaxConnectRetryInterval, ConnectRetryInterval is kept between the attempts.

MattBrittan commented 1 year ago

The PR looks useful but does not appear to address the related issue. You PR allows provides more flexibility re the delay between unsuccessful connections but the issue related to where the connection is successful but is dropped soon after being made due to an invalid publish, another client with the same ID etc. Do you want to try and address the larger issue or just go with this smaller change?

tomatod commented 1 year ago

MattBrittan, Thanks for your comment. I wasn't enough to check issue. I'll try to reproduce other situation and work on that again. If it's too difficult for me, I may push smaller modification.

tomatod commented 1 year ago

@MattBrittan I fixed PR, Could you review it again?

Summary

I worked on the bellow No.3 point not No.1, considering the order of implementation. When I work in No.1, I'll make new PR. https://github.com/eclipse/paho.mqtt.golang/issues/589#issuecomment-1366294006
I added new type backoffController for handling sleep time by some situations universally.

Implementation point

backoffController type has mutex in view of thread safe. I thought reconnect method may be called from multi threads.

According to the implementation below, internalConnLost method is expected to be stopped (return), when it's called by multi thread. But both the errors errConnLossWhileDisconnecting and errAlreadyHandlingConnectionLoss seem to be not returned anywhere although they are defined. So it may be that "c.status.ConnectionLost" (L515) doesn't return any error and internalConnLost continue even if that is call by secondary thread. https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L515-L522

Note that internalConnLost method may be called in the bellow. https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L683 https://github.com/eclipse/paho.mqtt.golang/blob/master/client.go#L693 https://github.com/tomatod/paho.mqtt.golang/blob/master/ping.go#L73

I was wondering where to embed a store for keeping statuses of continual connection error (in types client or connectionStatus ?), but there seems to be no appropriate place. So I made backoffCotroller as a new type and added it to client type.

MattBrittan commented 1 year ago

Sorry about the delay in getting to this. Your PR looks OK to me (and all tests pass) so I'm going to accept it (it's a bit more complicated than I'd expected but I can see why you have made the decisions you did).

reconnect method may be called from multi threads. It should not be (the status code should prevent this but it's from the most recent change so it's possible I've missed something). However a mutex is worthwhile , the code seems pretty generic so may be useful elsewhere so it's better to make it thread safe (and its performance is not really of critical importance)..

But both the errors errConnLossWhileDisconnecting and errAlreadyHandlingConnectionLoss seem to be not returned anywhere although they are defined.

You are correct - however this is not a big deal because the only real impact is that an extra error is logged (assuming logging is enabled). The check for these errors was really just to prevent logging of thinbgs that are expected to occur as part of normal operations. I believe the situations I was trying to stop logging for are here and here.

tomatod commented 1 year ago

@MattBrittan Thank you for your review and merge!!

fsudaman commented 1 year ago

Hi @MattBrittan any plan for next release soon that includes this PR? Thanks

MattBrittan commented 1 year ago

@fsudaman - sorry things have been pretty stable so I had not considered a release. It's been eight months so may as well! - Done

fsudaman commented 1 year ago

@MattBrittan we had some issues with huge amount of retries causing a bit of congestion at scale so this really help. Thank you for the quick response and much appreciated.

eclipse-paho / paho.mqtt.golang

Add back-off controller for sleep time of reconnection when connection lost is detected immediately after connecting. #589 #625

Summary

Implementation point