Closed mattczech closed 6 years ago
@mattczech Thank you for reporting this issue. I will look into this and get back to you.
Any news regarding this issue, seems like there still is issues with reconnecting logic?
@mattczech, @aat2703
Just a quick update. I have been able to reproduce the issue on my side and am continuing to work on it. Will post an update once I finish up my analysis.
Cool @cbommas
Could this issue have any relation to https://github.com/aws/aws-sdk-ios/issues/972 ?
@aat2703
Yep, these are related. I have a fix that I am testing on my side and am hoping to include in the next rev of the SDK.
@cbommas thats what i thought hehe.. Will you loop back when you have an scheduled release date with this bug fix? 👍
Absolutely. I will circle back on this thread once the fix is out.
@cbommas do you have an eta for the next rev of the SDK?
@aat2703 @mattczech
Version 2.6.23 was released yesterday ( july 5) and contains the fixes for this issue. Please upgrade and let me know if you are still running into issues.
@cbommas Thanks for looking into this and for the quick turn-around! v2.6.23 seems much more stable when disconnecting/reconnecting.
I'm still seeing an issue where the AWSIoTMQTTClient
will attempt to reconnect when the connection status is connected.
Steps to reproduce:
I added a debug statement to [AWSIoTMQTTSession reconnectToSession]
that prints out the current AWSIoTMQTTStatus
. You can see in the log below Attempting to reconnect. status=2
, which corresponds to AWSIoTMQTTStatusConnected
. Why would it reconnect if the session is already connected?
2018-07-12 12:25:41:084 [19005:4156973] Websocket didReceiveMessage: Received 5 bytes
2018-07-12 12:25:41:084 [19005:4157015] -[MQTTDecoder stream:handleEvent:] [Line 55] EventCode:2, stream: <__NSCFInputStream: 0x1cc1033c0>, Thread: <NSThread: 0x1cc07b380>{number = 19, name = (null)}
2018-07-12 12:25:41:084 [19005:4157015] -[MQTTSession decoder:newMessage:] [Line 361] messageType=9, status=2
2018-07-12 12:25:41:084 [19005:4157015] MQTTSession- newMessage msg type is 9
2018-07-12 12:25:41:084 [19005:4157015] -[MQTTSession handleSuback:] [Line 444]
2018-07-12 12:25:41:084 [19005:4157015] Sub Ack messageId 12
2018-07-12 12:25:41:084 [19005:4157015] MQTTSessionDelegate new ack for msgId: 12
2018-07-12 12:25:41:939 [19005:4157015] ClockTick:21: nothing to republish
2018-07-12 12:25:42:487 [19005:4157013] Attempting to reconnect. status=2
2018-07-12 12:25:42:487 [19005:4157013] AWSIoTMQTTClient: connecting via websocket.
2018-07-12 12:25:42:487 [19005:4156973] [SR] Closing with code 1000 reason (null)
2018-07-12 12:25:42:488 [19005:4156967] [SR] NSStreamEventHasSpaceAvailable <__NSCFOutputStream: 0x1c01045c0>
2018-07-12 12:25:42:536 [19005:4157013] username is : ?SDK=iOS&Version=2.6.19
2018-07-12 12:25:42.536720-0500 [19005:4156973] [MQTT] MQTT connection status changed Connecting
This seems to be caused by a race condition of detecting multiple error conditions around the same time that each initiating a reconnect:
It seems like this could be resolved by adding a check on mqttStatus
in [AWSIoTMQTTClient reconnectToSession]
but I'm open to any insight you have.
Our app is currently on its way to the App Store. I'll request feedback from our customers once it has made its way out, and will circle back on this thread 👍
@mattczech Thank you for your analysis on why the extra connection attempt is happening. I agree with your hypothesis and applied the fix you suggested. It has made it more stable and so far in my testing I haven't seen the extra connect. I will continue to test - Also, I will get this included in the next rev of the SDK. Thank you for your help in making this library better! 👍
@mattczech, @aat2703
Rev 2.6.25 released on 7/18 has the fixes for the connectionAge timer and the fix for the extraneous connect. Can you upgrade to the latest and let us know how it goes?
What implications did the "connectionAge timer" bug have?
The IoT client implements an exponential backoff strategy for connection retries. The initial value of the reconnectTime
is set to baseReconnectTimeInterval
set in the MQTTConfiguration
object.
When the IoT client detects that it has lost the connection to the server, it will wait until the reconnectTime
to try to reconnect. If the reconnect is unsuccessful, it will double the reconnectTime
and wait before reconnecting. The doubling will happen until the maximumReconnectTimeInterval
is reached.
From that point on the client will keep trying to reconnect once every reconnectTime
seconds ( the reconnectTime
will be pinned tomaximumReconnectTimeInterval
seconds)
When the IoT client is able to connect successfully, it will wait for minimumConnectionTimeInterval
before it resets the reconnectTime
back to the baseReconnectTimeInterval
.
The bug essentially was that the reconnectTime
wasn't being reset. The implication is that if a client gets disconnected, the reconnectTime will max out and never be reset; which will mean that the client will only retry at the maximumReconnectTimeInterval
. To the end user, it can appear that the app is taking a "long" time to reconnect ("long" is based on what the max is set to).
In v2.6.25 we are seeing this log messages too.
ClockTick:64: nothing to republish
@karasahinemre
Looks like you've set the logs at the debug level. You can eliminate these log messages by setting the log level of info or higher. I will set these messages to verbose in a future release.
@karasahinemre
The change to move the messages to verbose will be included in the next rev of the SDK.
Closing this issue as we are tracking updates on #1002
To help us solve your problem better, please answer the following list of questions.
AWSIoT
2.6.21
Actual devices
No
No
CocoaPods
After reconnecting from an offline state I will occasionally see an error "WebSocket closed with code:1001 with reason:Stream end encountered" show up in the logs. When this happens, I receive no notification that the connection status has changed (AWSIoTMQTTStatus == .connected) and new messages that are sent to subscribed topics are not received by the app. I would expect that after this occurs a reconnect would be attempted to restore the connection.
Looking further in the logs, it seems like this might be caused by attempting to reconnect after the connection has already been established. Here you can see that a message was just received and then it immediately attempts to reconnect, even though there is no error indicating that there was anything wrong with the connection.
I'm able to regularly reproduce this issue by putting my device in airplane mode, turning off wifi, waiting a few seconds or minutes and then turning wifi back on.