Closed brendancallahan closed 3 years ago
Here are some additional crashes we are having. Some of them may be similar to or duplicates of crashes above due to minor changes in the IoT SDK, but I've included them in case they provide an more information.
Crash 5 - 18 users
AWSIoTDataManager.m line 970 -[AWSIoTDataManager handleMessagesForShadow:operation:status:payload:]
94% iOS 13, 6% background
Crashed: com.apple.root.default-qos EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000010 0 AWSIoT AWSIoTDataManager.m - Line 970 - [AWSIoTDataManager handleMessagesForShadow:operation:status:payload:] + 970 1 AWSIoT AWSIoTDataManager.m - Line 1035 shadowMqttMessageHandler_block_invoke + 1035 2 AWSIoT AWSIoTMQTTClient.m - Line 1120 __47-[AWSIoTMQTTClient session:newMessage:onTopic:]_block_invoke.545 + 1120
Crash 6 - 5 users
AWSMQTTSession.m line 138 -[AWSMQTTSession connectToInputStream:outputStream:]
100% iOS 13, 80% in background
Crashed: Thread #1 EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x000000000000004f -[AWSMQTTSession connectToInputStream:outputStream:] 0 libobjc.A.dylib objc_msgSend + 8 1 AWSIoT AWSMQTTSession.m - Line 138 -[AWSMQTTSession connectToInputStream:outputStream:] + 138 2 AWSIoT AWSIoTMQTTClient.m - Line 750 -[AWSIoTMQTTClient openStreams:] + 750
Crash 7 - 4 users AWSIoTMQTTClient.m line 1172 -[AWSIoTMQTTClient webSocketDidOpen:]
100% iOS 13, 13% in background
Crashed: com.apple.root.default-qos SIGABRT ABORT 0x00000001bfe98ef4 10 libobjc.A.dylib objc_release + 136 11 AWSIoT AWSIoTMQTTClient.m - Line 1172 -[AWSIoTMQTTClient webSocketDidOpen:] + 1172 12 AWSIoT AWSSRWebSocket.m - Line 476 __39-[AWSSRWebSocket _HTTPHeadersDidFinish]_block_invoke + 476
Crash 8 - 3 users
AWSIoTMQTTClient.m line 1171 -[AWSIoTMQTTClient webSocketDidOpen:]
100% iOS 13, 67% in background
Crashed: com.apple.root.default-qos SIGABRT ABORT 0x00000001afa1eefc 10 objc_release + 136 11 AWSIoT AWSIoTMQTTClient.m - Line 1171 -[AWSIoTMQTTClient webSocketDidOpen:] + 1171 12 AWSIoT AWSSRWebSocket.m - Line 476 __39-[AWSSRWebSocket _HTTPHeadersDidFinish]_block_invoke + 476
Crash 9 - 2 users AWSIoTMQTTClient.m line 1086 -[AWSIoTMQTTClient session:newMessage:onTopic:]
100% iOS 13, 0% in background
Crashed: Thread #1 EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x007f00010400e600 -[AWSIoTMQTTClient session:newMessage:onTopic:] 0 libobjc.A.dylib objc_retain + 8 1 CoreFoundation __NSArrayI_new + 200 2 CoreFoundation -[NSDictionary allKeys] + 312 3 AWSIoT AWSIoTMQTTClient.m - Line 1086 -[AWSIoTMQTTClient session:newMessage:onTopic:] + 1086 4 AWSIoT AWSMQTTSession.m - Line 513 -[AWSMQTTSession handlePublish:] + 513 5 AWSIoT AWSMQTTSession.m - Line 445 -[AWSMQTTSession newMessage:] + 445 6 AWSIoT AWSMQTTSession.m - Line 408 -[AWSMQTTSession decoder:newMessage:] + 408 7 AWSIoT AWSMQTTDecoder.m - Line 153 -[AWSMQTTDecoder stream:handleEvent:] + 153 8 CoreFoundation _signalEventSync + 212
Crash 10 - 2 users AWSMQTTEncoder.m line 29 -[AWSMQTTEncoder .cxx_destruct]
100% iOS 13, 50% in background
Crashed: Thread #1 EXC_BREAKPOINT 0x00000001b907daec 3 libobjc.A.dylib objc_release + 136 4 AWSIoT AWSMQTTEncoder.m - Line 29 -[AWSMQTTEncoder .cxx_destruct] + 29 5 libobjc.A.dylib object_cxxDestructFromClass(objc_object, objc_class) + 116 8 libobjc.A.dylib objc_release + 136 9 AWSIoT AWSMQTTSession.m - Line 127 -[AWSMQTTSession connectToInputStream:outputStream:] + 127 10 AWSIoT AWSIoTMQTTClient.m - Line 750 -[AWSIoTMQTTClient openStreams:] + 750 11 Foundation -[NSThread main] + 40
Crash 11 - 2 users AWSMQTTDecoder.m line 45 -[AWSMQTTDecoder open]
100% iOS 13, 100% in background
EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000080 -[AWSMQTTDecoder open] 0 libobjc.A.dylib objc_msgSend + 8 1 AWSIoT AWSMQTTDecoder.m - Line 45 -[AWSMQTTDecoder open] + 45 2 AWSIoT AWSMQTTSession.m - Line 138 -[AWSMQTTSession connectToInputStream:outputStream:] + 138 3 AWSIoT AWSIoTMQTTClient.m - Line 750 -[AWSIoTMQTTClient openStreams:] + 750 4 Foundation -[NSThread main] + 40
Crash 12 - 2 users AWSIoTMQTTClient.m line 1196 -[AWSIoTMQTTClient webSocket:didFailWithError:]
100% iOS 13, 0% in background
Crashed: com.apple.root.default-qos EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000000 1 CoreFoundation _CFStreamClose + 260 2 AWSIoT AWSIoTMQTTClient.m - Line 1196 -[AWSIoTMQTTClient webSocket:didFailWithError:] + 1196 3 AWSIoT AWSSRWebSocket.m - Line 773 __33-[AWSSRWebSocket _failWithError:]_block_invoke_2 + 773 4 libdispatch.dylib _dispatch_call_block_and_release + 24
Crash 13 - 1 user AWSIoTMQTTClient.m line 718 -[AWSIoTMQTTClient initiateReconnectTimer:]
100% iOS 13, 0% in background
Crashed: Thread #1 EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000020 0 libobjc.A.dylib objc_retain + 16 1 AWSIoT AWSIoTMQTTClient.m - Line 718 -[AWSIoTMQTTClient initiateReconnectTimer:] + 718 2 Foundation NSThreadstart__ + 848
Crash 14 - 1 user AWSSRWebSocket.m line 1609 -[AWSSRWebSocket stream:handleEvent:]
100% iOS 13, 100% in background
Crashed: com.squareup.SocketRocket.NetworkThread EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000020 2 libdispatch.dylib dispatch_async + 60 3 AWSIoT AWSSRWebSocket.m - Line 1609 -[AWSSRWebSocket stream:handleEvent:] + 1609 4 CoreFoundation _signalEventSync + 216 15 Foundation -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 232 16 AWSIoT AWSSRWebSocket.m - Line 1874 -[_SRRunLoopThread main] + 1874 17 Foundation NSThreadstart__ + 852
hi @brendancallahan, sorry for the issues this has caused and thanks for the detailed report. The previous issues that you referenced were reports on iOS 11/12. Were you seeing crashes before iOS 13 as well?
in regards to the disconnect and connectUsingWebSocket flow, we will have to investigate further to see if we can reproduce the scenario and find similar crashes. I wonder if it is due to some asynchronous process of a disconnect and connect running at the same time that is causing the issue. Is it possible for you to instantiate an AWSIotDataManager and connect when you need to rather than what you are currently doing?
Hi @lawmicha , thanks for your response. The crash reports have been mostly iOS 13, but some crashes are not 100% iOS 13 users. We released the IoT functionality in our app after the release of iOS 13, however, and based on analytics only 3% of our users are on pre iOS 13. However, Crashlytics has flagged the first issue with an "Ios 13" icon, suggesting the sample size is large enough to imply it may be an iOS 13 specific issue.
Is it possible for you to instantiate an AWSIotDataManager and connect when you need to rather than what you are currently doing?
Do you mean we should try creating a new AWSIoTDataManager instance for each reconnection, and only at the point of connection, discarding the old one? This shouldn't be too difficult to implement on our end, just want to clarify.
Thanks!
By the way, I've done some looking into the analytics logs for these crashes, and they don't appear to be happening at the same time that our app is calling the disconnect and reconnect functions; in fact they seem to be happening at an arbitrary time in normal app sessions.
I've added in some analytics events to our IoT class, to see what's going on there during the crashes, and I've also added analytics events for application events like the app entering background, app becamoming active, app entered foreground, etc. We will get this out in our next release, hopefully in the next week or so, in case it provides any more insight on this issue.
Crashlytics has now reported the most common crash (crash #1) on an iOS 12 device, so it doesn't look like that is iOS 13 specific after all
We've had the version with analytics released for a day or two. So far, t we have seen crashes at
AWSIoTMQTTClient.m – line 1173 -[AWSIoTMQTTClient webSocketDidOpen:]
AWSSRWebSocket.m – line 476 __39-[AWSSRWebSocket _HTTPHeadersDidFinish]_block_invoke
The analytics event that we've added show for the MQTT status callbacks:
413 06:10:52.867 PM IoTStatusDisconnected 414 06:10:52.870 PM IoTStatusConnectionError 415 06:10:53.956 PM IoTStatusDisconnected 416 06:10:53.957 PM IoTStatusConnectionError 417 06:10:56.578 PM IoTStatusConnecting 418 06:10:56.617 PM IoTStatusConnecting
It seems that all of these crashes so far on the new version show this behavior, of two consecutive connection errors very close together, followed by two consecutive connecting events.
We aren't making a call to connect IoT here (as that would generate an analytics event); it appears that this may be happening in the auto-reconnect logic in the SDK.
It also seems that many users have this IoTConnectionError (corresponding to the .connectionError state), and then reconnect successfully with the expected number of MQTT status events; it only appears to be in rare cases where this double disconnection/reconnection seems to be triggered for customers.
@brendancallahan - Just wanted to give you a heads up, and thank you for all of the detailed information with regards to this bug -- it has not gone unnoticed! :).
I was able to setup an example IoT app and connect over websockets, however, I have been unsuccessful in being able to reproduce a crash ( by going offline/online, publishing events, subscribing, etc..). That being said, I haven't given up yet. Based on your stack traces, and seeing IoTStatus quickly cycling between disconnected/ConnectionError/Connecting (all within 4 seconds)-- my guess is that there's some type of threading issue in AWSIotMQTTClient where perhaps the MQTT SDK and SocketRocket callbacks are competing with each other -- but I need to dive in a bit more.
Given I've been unable to reproduce this on a device, my next step is to write some very pointed unit tests while mocking out actual websocket connections & mqtt sessions while testing various combinations of connecting/disconnecting/errors/session terminations. crosses fingers
If you have any other information unique to these crashes (like how many topics you are publishing/subscribing to/from or any other parameters that you may suspect which would create such a situation), please let us know.
Thanks for your response! I've added some analytics events to monitor subscriptions, and will also be monitoring publish events in the next version released. I estimate the app has around 10 subscriptions maximum at the time the crash occurs.
I should also add that our analytics event for "connectionError" is grouped with "disconnected" in the switch statement -- we are not actually receiving the disconnected MQTT status before the crash, only "connection error" twice.
Here's an example of an app session leading up to it, with the (actually absent) disconnected status events edited out. It appears that when the duplicate connection error is received, it doesn't always trigger the duplicate "connecting" status:
155 07:52:17.772 AM IoTStatusConnecting 156 07:52:19.749 AM IoTStatusConnected 157 07:52:19.752 AM IoTHndshkLoginUser (this involves requesting & receiving some data from a topic) 158 07:52:20.787 AM IoTHndshkSuccess 160 07:53:35.222 AM IoTStatusConnectionError 162 07:53:36.307 AM IoTStatusConnectionError 163 07:53:36.319 AM IoTStatusConnecting 164 07:53:38.290 AM IoTStatusConnected 165 07:53:38.291 AM IoTHndshkLoginUser 166 07:53:39.338 AM IoTHndshkSuccess 168 08:15:39.377 AM IoTStatusConnectionError 170 08:15:40.302 AM IoTStatusConnectionError 171 08:15:44.796 AM IoTStatusConnecting 172 08:15:44.836 AM IoTStatusConnecting
Here's an example of some analytics events, now with some tracking for reachability:
It appears that reachability changes are happening right before this event occurs. ReachabilityChangedCellular corresponds to Reachability.Connection.cellular, and ReachabilityChangedWifi is Reachability.Connection.wifi
19 11:58:23.686 PM IoTHndshkSuccess 20 01:44:20.309 AM ReachabilityChangedCellular 21 01:44:25.794 AM ReachabilityChangedWifi 23 01:44:25.821 AM IoTStatusConnectionError 25 01:44:26.901 AM IoTStatusConnectionError 26 01:44:30.255 AM ReachabilityChangedWifi 27 01:44:32.973 AM IoTStatusConnecting 28 01:44:33.013 AM IoTStatusConnecting
@brendancallahan - Thanks for all the information. I’ve written test apps which exercise different paths of the code (connect(), disconnect(), subscribe, airplane mode on/of, poor network, 100% loss etc..), attempted to manually inject errors to instrument failures, and unfortunately I’m still having trouble reproducing crashes :'(.
I see a number of stack traces, and it's a bit unclear if there's one, or many bugs we're running into here. So, taking a step back, is there a single case that we can reliably reproduce? Is it possible to get the associated code and reproduction steps? If so, can that be shared (maybe through GitHub)? If you can reproduce this issue, is there any chance you’d like to submit a PR and propose a fix in the code?
Hi, sorry for my late response, I've been working on getting a critical feature out the door.
I'll see if I can reproduce it -- I do remember that we actually implemented the connecting and disconnecting from IoT because of some crashes switching from Wi-Fi to cellular. Now that I think about it, it may have been this crash/these crashes. In the situation where the issue occurred, the user would connect their phone's Wi-Fi to our IoT device's Wi-Fi, to send over some configuration information to the IoT device. After this, the IoT device would drop its connection with the user's phone, and the user's phone would reconnect to their local Wi-Fi.
The one thing that might be relevant here is that the user is connecting to a Wi-Fi network without internet, then possibly to cellular, then back to their local wifi network. I wonder if connecting to an access point which doesn't have internet, then back to an access point with Internet could have something to do with this? Or, some weird OS/reachability behavior in the switching of access points. Occasionally, the user's phone's Wi-Fi would auto-reconnect to the user's IoT device Wi-Fi after configuration, if configuration wasn't successful and the access point wasn't turned off by the IoT device.
I'll see if I can replicate this sometime this week, and see what I can come up with for this. While I would love to deep dive into the nuance of the AWS IoT framework and create a PR to fix this, it is probably unlikely to be prioritized, given the small percentage of users affected by these crashes and the amount of features we have to develop, but I'll see what I can do.
By the way, I think that probably some of these stack traces are representing the same bug, but the line of code has changed in various updates to the AWS IoT framework, so it is registered as a different crash. My guess is that crashes in the same named function are the same crash.
This issue has been automatically closed because of inactivity. Please open a new issue if are still encountering problems.
We've done some major refactoring to our app, and noticed that we were starting AWS IoT on the main thread. Initializing the IoT SDK content and interacting with it on a separate dedicated queue seems to have resolved these issues completely. Thanks @wooj2 for helping with testing and debugging!
@brendancallahan thanks very much for the update--glad you were able to find that!
Hi, I'm sorry to say, but I think there was a delay in symbolicating the crash reports when I looked at them, although other crashes from newer versions were reported. We are still seeing the original crashes in
AWSIoTMQTTClient.m line 1173 (35 users in last week) -[AWSIoTMQTTClient webSocketDidOpen:]
AWSIoTMQTTClient.m line 1171 (8 users in last week) -[AWSIoTMQTTClient webSocketDidOpen:]
AWSSRWebSocket.m line 476 (6 users in last week) __39-[AWSSRWebSocket _HTTPHeadersDidFinish]_block_invoke
AWSMQTTDecoder.m line 45 (2 users in last week) -[AWSMQTTDecoder open]
AWSMQTTSession.m line 138 (2 users in last week) -[AWSMQTTSession connectToInputStream:outputStream:]
Sorry about that, I guess I should give it a day or so to symbolicate next time to be sure! I'll continue looking into what I could do to help determine the cause of this.
By the way, with these events, we seem to still be seeing these duplicate "connection error" and "connecting" callbacks when the issue is encountered. I'm seeing from analytics events that sometimes the connection error -> connecting loop repeats many times, with duplicate callbacks for both, before a crash is encountered.
Please re-open, this is a significant issue.
We are getting issues with two threads running AWSIoTMQTClient, likely causing memory access issues across threads.
@palpatim - Please see below multiple threads running a client instance. This could be causing memory access issues.
I wonder if it's an issue that __CFStreamDeallocate is being called on one thread, while in another thread the AWSIoTMQTTClient is running openStream...?
Thread 10:
0 libsystem_kernel.dylib 0x00000001b0b5c784 mach_msg_trap + 8
1 libsystem_kernel.dylib 0x00000001b0b5bba8 mach_msg + 76 (mach_msg.c:103)
2 CoreFoundation 0x00000001b0d13314 __CFRunLoopServiceMachPort + 152 (CFRunLoop.c:2575)
3 CoreFoundation 0x00000001b0d0e0a0 __CFRunLoopRun + 1156 (CFRunLoop.c:2931)
4 CoreFoundation 0x00000001b0d0d8f4 CFRunLoopRunSpecific + 480 (CFRunLoop.c:3192)
5 Foundation 0x00000001b1056b18 -[NSRunLoop(NSRunLoop) runMode:beforeDate:] + 232 (NSRunLoop.m:374)
6 AWSIoT 0x00000001053b2df0 -[AWSIoTMQTTClient openStreams:] + 684
7 Foundation 0x00000001b1190c10 __NSThread__start__ + 864 (NSThread.m:724)
8 libsystem_pthread.dylib 0x00000001b0a9d8fc _pthread_start + 168 (pthread.c:896)
9 libsystem_pthread.dylib 0x00000001b0aa59d4 thread_start + 8
Thread 11 name:
Thread 11 Crashed:
0 libsystem_kernel.dylib 0x00000001b0b7edf0 __pthread_kill + 8
1 libsystem_pthread.dylib 0x00000001b0a9e930 pthread_kill + 228 (pthread.c:1458)
2 libsystem_c.dylib 0x00000001b0a2cc24 __abort + 116 (abort.c:147)
3 libsystem_c.dylib 0x00000001b0a2cbb0 abort + 116 (abort.c:118)
4 libsystem_malloc.dylib 0x00000001b0a8ffdc malloc_vreport + 564 (malloc_printf.c:183)
5 libsystem_malloc.dylib 0x00000001b0a901a4 malloc_report + 64 (malloc_printf.c:192)
6 libsystem_malloc.dylib 0x00000001b0a83d1c free + 436 (malloc.c:1733)
7 CoreFoundation 0x00000001b0dad2f8 cbDestroy + 36 (tsCircularBuffer.c:87)
8 CoreFoundation 0x00000001b0d2cd08 boundPairCommonFinalize + 28 (CFStreamPair.c:130)
9 CoreFoundation 0x00000001b0d28f44 __CFStreamDeallocate + 136 (CFStream.c:345)
10 CoreFoundation 0x00000001b0d15274 _CFRelease + 252 (CFRuntime.c:2086)
11 AWSIoT 0x00000001053b5a8c -[AWSIoTMQTTClient webSocketDidOpen:] + 284
12 AWSIoT 0x00000001053da6e4 __39-[AWSSRWebSocket _HTTPHeadersDidFinish]_block_invoke + 132
13 libdispatch.dylib 0x00000001b0a36ec4 _dispatch_call_block_and_release + 32 (init.c:1408)
14 libdispatch.dylib 0x00000001b0a3833c _dispatch_client_callout + 20 (object.m:495)
15 libdispatch.dylib 0x00000001b0a476e8 _dispatch_root_queue_drain + 644 (inline_internal.h:2484)
16 libdispatch.dylib 0x00000001b0a47d9c _dispatch_worker_thread2 + 116 (queue.c:6628)
17 libsystem_pthread.dylib 0x00000001b0a9f6d8 _pthread_wqthread + 216 (pthread.c:2364)
18 libsystem_pthread.dylib 0x00000001b0aa59c8 start_wqthread + 8
@palpatim this isn't closed.
Piggy backing on this, but it appears to be doing the same for me. It appears that two threads are running in the client and attempting to reconnect. This specifically happens for me when an app is backgrounded for some time (1 min plus) and is then foregrounded.
Crash logs:
Thread 8 name:
Thread 8 Crashed:
0 libobjc.A.dylib 0x00000001bd8deaa0 objc_release + 16 (objc-object.h:551)
1 MQTTManager 0x00000001046f1d54 MQTTManager.setXboIdAndPartnerId() + 280 (<compiler-generated>:0)
2 MQTTManager 0x00000001046f0bb0 MQTTManager.connectToMessageBroker() + 136 (MQTTManager.swift:166)
3 MQTTManager 0x00000001046f330c MQTTManager.reconnectToMessageBroker() + 32 (MQTTManager.swift:191)
4 MQTTManager 0x00000001046f2bfc MQTTManager.mqttEventCallback(_:) + 2780 (MQTTManager.swift:227)
5 MQTTManager 0x00000001046f2d2c thunk for @escaping @callee_guaranteed (@unowned AWSIoTMQTTStatus) -> () + 40 (<compiler-generated>:0)
6 AWSIoT 0x0000000102fddb1c __42-[AWSIoTMQTTClient notifyConnectionStatus]_block_invoke + 116
7 libdispatch.dylib 0x00000001bd8669a8 _dispatch_call_block_and_release + 24 (init.c:1408)
8 libdispatch.dylib 0x00000001bd867524 _dispatch_client_callout + 16 (object.m:495)
9 libdispatch.dylib 0x00000001bd84d65c _dispatch_root_queue_drain + 640 (inline_internal.h:2484)
10 libdispatch.dylib 0x00000001bd84dcd0 _dispatch_worker_thread2 + 112 (queue.c:6628)
11 libsystem_pthread.dylib 0x00000001bd8b8b38 _pthread_wqthread + 212 (pthread.c:2364)
12 libsystem_pthread.dylib 0x00000001bd8bb740 start_wqthread + 8
Thread 9:
0 libsystem_pthread.dylib 0x00000001bd8bb738 start_wqthread + 0
Thread 10 name:
Thread 10:
0 libobjc.A.dylib 0x00000001bd8deaa0 objc_release + 16 (objc-object.h:551)
1 AWSIoT 0x0000000102fdc478 -[AWSIoTMQTTClient connectWithClientId:cleanSession:configuration:customAuthorizerName:tokenKeyName:tokenValue:tokenSignature:keepAlive:willTopic:willMsg:willQoS:willRetainFlag:statusCallback:] + 344
2 AWSIoT 0x0000000102f96d2c -[AWSIoTDataManager connectUsingWebSocketWithClientId:cleanSession:customAuthorizerName:tokenKeyName:tokenValue:tokenSignature:statusCallback:] + 1440
3 MQTTManager 0x00000001046f0dbc MQTTManager.connectToMessageBroker() + 660 (MQTTManager.swift:170)
4 MQTTManager 0x00000001046f330c MQTTManager.reconnectToMessageBroker() + 32 (MQTTManager.swift:191)
5 MQTTManager 0x00000001046f2bfc MQTTManager.mqttEventCallback(_:) + 2780 (MQTTManager.swift:227)
6 MQTTManager 0x00000001046f2d2c thunk for @escaping @callee_guaranteed (@unowned AWSIoTMQTTStatus) -> () + 40 (<compiler-generated>:0)
7 AWSIoT 0x0000000102fddb1c __42-[AWSIoTMQTTClient notifyConnectionStatus]_block_invoke + 116
8 libdispatch.dylib 0x00000001bd8669a8 _dispatch_call_block_and_release + 24 (init.c:1408)
9 libdispatch.dylib 0x00000001bd867524 _dispatch_client_callout + 16 (object.m:495)
10 libdispatch.dylib 0x00000001bd84d65c _dispatch_root_queue_drain + 640 (inline_internal.h:2484)
11 libdispatch.dylib 0x00000001bd84dcd0 _dispatch_worker_thread2 + 112 (queue.c:6628)
12 libsystem_pthread.dylib 0x00000001bd8b8b38 _pthread_wqthread + 212 (pthread.c:2364)
13 libsystem_pthread.dylib 0x00000001bd8bb740 start_wqthread + 8
Hi @ariveralee,
Thanks for trying to help us reproduce this bug, but I'm not entirely sure your stack trace is related to the stack traces that were previously reported. In the stack traces that you have included, it seems that there is a call from [AWSIoTMQTTClient notifyConnectionStatus]
which is being invoked on two separate threads (8 and 10), and the 8th thread seems to be crashing. This seems to be different than the stack traces above (but again, I'm not sure). Could you do me a favor and please open a separate issue for the issue you are seeing?
That being said, I have been unable to reproduce the crash you are seeing in my sample app by backgrounding the app for 2 minutes and then resuming it.
The sample app I am using is located here: https://github.com/awslabs/aws-sdk-ios-samples/tree/main/IoT-Sample/Swift
It's a simple "hello world" app which allows you to make a websocket connection, and then publish & subscribe on the topic. So specifically, what I'm trying is:
Observed behavior: No crash
I suspect this simple app is not an accurate reflection of the way you are using the library, so if there's some sample code you can provide in helping me reproduce this, it would be greatly appreciated.
Hey @wooj2 ! I'd like to do that, Given this is an issue for a product I'm working on for Comcast, would we be able to have a formal call with this so I could walk you over the details? I do have a support ticket open that I could send you for reference. I suspect It's making it's way up to the IoT team. Could I have your email? Or I can give mine for you to set something up.
Hi @ariveralee, Contacting us through github is the preferred medium. Posting on github helps document issues against our code for current and future users who are experiencing similar issues, as well as encourages building our open source community. To help us move forward, would you be able to attach some sample code that helps us reproduce the issue you are seeing?
Hey @wooj2 Given the code is private, we're going to go with this internally with a TAM. Thanks for your replies on this! I'll withdraw my comments from this issue so that it can be closed accordingly, if needed.
@brendancallahan
Our latest release of AWS SDK iOS (2.16.0) has a couple changes that we believe will increase stability in IoT. You may want to check this out if you are still experiencing issues in your application.
Thanks!
Sounds good, we should have a release going out in the next week or so. I'll let you know what the result is, thanks!
Hi, we've had the new version out for a couple of weeks; I wanted to see how the crashes are playing out. The bad news is that we're still getting many of the above crashes, with the double event. The good news is that these crashes seem much less frequent -- we've only had about 8 crashes like this over the past week, which is definitely an improvement from before. Let me know if there's anything else I can do to help with this!
We're investigating some data race conditions in the IoT SDK that could be causing some of these (and other) issues. Tracking that in https://github.com/aws-amplify/aws-sdk-ios/issues/3303
We haven't been able to reproduce these. We're hoping to get some additional information from folks reporting similar issues on #3303.
Closing in favour of tracking these crashes in #3303
Describe the bug I've noticed on our bug reporting service, Crashlytics, that there are four crashes that our users have been experiencing. However, these crashes never occur while connected to the debugger. One of our users said the app crashed when she pressed a button to go to Settings from the app, which brings up the iOS settings app and puts our app in the background.
It looks like it may be related to these issues?
https://github.com/aws-amplify/aws-sdk-ios/issues/1257 https://github.com/aws-amplify/aws-sdk-ios/issues/1209
However, it appears that my crash logs (below) are a little different.
Which AWS service(s) are affected?
The issue is occurring in the AWSIoT framework.
Environment:
Device Information:
Additional Info
In order for our app to connect to our wifi IoT device, the app anticipates the Internet connection being lost at this step and calls
self.iotDataMgr.disconnect()
After we have sent configuration information to the IoT device, the app reconnects the same AWSIotDataManager, using:
self.iotDataMgr.connectUsingWebSocket(withClientId: cognitoClientID, cleanSession: true, statusCallback: self.mqttEventCallback)
This is in order to avoid the connection switching to cellular and back during the configuration process, and to anticipate the user disconnecting from their wifi. I'm unsure if the disconnection and reconnection here could be related to this issue.
Crash 1 - 38 users
AWSSRWebSocket.m – line 476 __39-[AWSSRWebSocket _HTTPHeadersDidFinish]_block_invoke
100% iOS 13, 43% in background
Crash 2 - 15 users
AWSIoTMQTTClient.m – line 1173 -[AWSIoTMQTTClient webSocketDidOpen:]
100% iOS 13, 39% in background
Crash 3 - 11 users
AWSIoTMQTTClient.m – line 1171 -[AWSIoTMQTTClient webSocketDidOpen:]
91% iOS 13, 81% in background
Crash 4 - 7 users
AWSMQTTSession.m – line 136 -[AWSMQTTSession connectToInputStream:outputStream:]
100% iOS 13, 43% in background
Please let me know of any information I can provide or tests I can run that would be helpful!