aws-amplify / aws-sdk-ios

AWS SDK for iOS. For more information, see our web site:
https://aws-amplify.github.io/docs
Other
1.68k stars 885 forks source link

EXC_BAD_ACCESS AWSIoTMQTTClient.m:1326 #5453

Open AndrKonovalov opened 4 hours ago

AndrKonovalov commented 4 hours ago

Describe the bug App randomly crashes with error:

EXC_BAD_ACCESS AWSIoTMQTTClient.m:1326
Attempted to dereference garbage pointer 0x8.

To Reproduce Steps to reproduce the behavior: N/A

Observed Behavior App is crashing

Expected Behavior I'm expecting to reconnect to the AWS client.

Stack Trace

to large attached as [threads.txt](https://github.com/user-attachments/files/17513554/threads.txt)

Code Snippet N/A Unique Configuration N/A

Areas of the SDK you are using (AWSMobileClient, Cognito, Pinpoint, IoT, etc)?

Screenshots N/A

Environment(please complete the following information):

Device Information (please complete the following information):

Logs

EXC_BAD_ACCESS: Attempted to dereference garbage pointer 0x8.

0 CoreFoundation +0xa263c _CFRunLoopAddTimer 1 AWSIoT +0x4ddac -[AWSIoTMQTTClient scheduleReconnection] (AWSIoTMQTTClient.m:1326:9) 2 AWSIoT +0x4a928 43-[AWSIoTMQTTClient initiateReconnectTimer:]_block_invoke (AWSIoTMQTTClient.m:780:9) 3 libdispatch.dylib +0x2138 dispatch_call_block_and_release 4 libdispatch.dylib +0x3dd0 dispatch_client_callout 5 libdispatch.dylib +0xb3fc dispatch_lane_serial_drain 6 libdispatch.dylib +0xbf2c dispatch_lane_invoke 7 libdispatch.dylib +0x16cb0 dispatch_root_queue_drain_deferred_wlh 8 libdispatch.dylib +0x16524 dispatch_workloop_worker_thread 9 libsystem_pthread.dylib +0x4930 pthread_wqthread

After a little deep dive it seems that issue arises because of the cleanupReconnectTimer and scheduleReconnection methods run on different threads. cleanupReconnectTimer is specifically designed to run on the reconnectThread, whereas scheduleReconnection operates on a different timerQueue. This lack of synchronization between threads can cause a race condition where the reconnectTimer is being set up by scheduleReconnection while being invalidated or cleaned up by cleanupReconnectTimer. The cleanupReconnectTimer method ensures it queues cleanup operations on the correct thread (reconnectThread). However, if scheduleReconnection is in the middle of setting up the reconnect timer, and cleanupReconnectTimer is called at the same time on another thread, the timer could be invalidated before it is fully initialized, leading to unexpected behavior or crashes. call AWSIoTDataManager.m disconnect method ``` - (void)disconnect{ if ( !_userDidIssueConnect || _userDidIssueDisconnect ) { //Have to be connected to make this call. noop this call by returning return ; } _userDidIssueConnect = NO; _userDidIssueDisconnect = YES; [self.mqttClient disconnect]; } ``` which is calling AWSIoTMQTTClient.m ``` - (void)disconnect { if (self.userDidIssueDisconnect ) { //Issuing disconnect multiple times. Turn this function into a noop by returning here. return; } //Invalidate the reconnect timer so that there are no reconnect attempts. [self cleanupReconnectTimer]; //Set the userDisconnect flag to true to indicate that the user has initiated the disconnect. self.userDidIssueDisconnect = YES; self.userDidIssueConnect = NO; //call disconnect on the session. [self.session disconnect]; self.connectionAgeInSeconds = 0; //Cancel the current streams thread [self.streamsThread cancelAndDisconnect:YES]; __weak AWSIoTMQTTClient *weakSelf = self; self.streamsThread.onStop = ^{ __strong AWSIoTMQTTClient *strongSelf = weakSelf; //If the userDidIssueDisconnect has been set to NO, it means a new connection has been requested, //so we should disregard these updates if (!strongSelf || !strongSelf.userDidIssueDisconnect) { return; } //Invalidate connection age timer and close socket if (strongSelf.connectionAgeTimer != nil) { [strongSelf.connectionAgeTimer invalidate]; strongSelf.connectionAgeTimer = nil; } if (strongSelf.webSocket) { [strongSelf.webSocket close]; strongSelf.webSocket = nil; } //Notify disconnected status. strongSelf.mqttStatus = AWSIoTMQTTStatusDisconnected; [strongSelf notifyConnectionStatus]; }; AWSDDLogInfo(@"AWSIoTMQTTClient: Disconnect message issued."); } ``` disconnect has no synchronization it calls cleanupReconnectTimer which is invalidates and removes reference to the reconnect timer on the correct thread to avoid creating a memory leak. ``` @discussion If called on any thread other than the reconnect thread the work is queued up on the reconnect thread but the method returns without waiting for the invalidation to be completed. This is called initially on the thread the consumer is calling the client's disconnect method on. */ - (void)cleanupReconnectTimer { if (self.reconnectTimer == nil) { return; } if (self.reconnectThread) { if (!self.reconnectThread.isFinished && ![[NSThread currentThread] isEqual:self.reconnectThread]) { // Move to reconnect thread to cleanup only if it's still running [self performSelector:@selector(cleanupReconnectTimer) onThread:self.reconnectThread withObject:nil waitUntilDone:NO]; return; } [self.reconnectTimer invalidate]; self.reconnectTimer = nil; } } ``` The code that then has issues is: ``` - (void)scheduleReconnection { dispatch_assert_queue(self.timerQueue); BOOL isConnectingOrConnected = self.mqttStatus == AWSIoTMQTTStatusConnected || self.mqttStatus == AWSIoTMQTTStatusConnecting; if (!self.reconnectTimer && !isConnectingOrConnected) { self.reconnectTimer = [NSTimer timerWithTimeInterval:self.currentReconnectTime target:self selector: @selector(reconnectToSession) userInfo:nil repeats:NO]; [[NSRunLoop currentRunLoop] addTimer:self.reconnectTimer forMode:NSDefaultRunLoopMode]; [[NSRunLoop currentRunLoop] runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantFuture]]; } } ```
edisooon commented 3 hours ago

Thanks for your insight, one of our team members will do some investigation on this as soon as possible