aws-greengrass / aws-greengrass-shadow-manager

A GreengrassV2 Component that provides offline device shadow documents and optional synchronization to the IoT device shadow service.
Apache License 2.0
9 stars 5 forks source link

fix: synchronize cloud data client operations properly #204

Closed saranyailla closed 1 month ago

saranyailla commented 1 month ago

Issue #, if available:

Description of changes: Run mqtt callbacks in a separate thread to avoid a deadlock situation that happens when the Shadow manager component enters into RUNNING state before the MQTT client connection is successfully created acc to GG.

Mqtt connect future will be completed with the client only after the first on connect callbacks are triggered. Shadow manager onConnect callback needs the client to be fully formed (connect future to be completed with the mqtt client) for it to use subscribe with it. Hence, the subscriptions triggered from the callback timeout waiting for the client.

During SM start up, startSyncingShadows is called which calls updateSubscriptions on the cloudDataClient. That spins up a new thread from the executor service pool which run this private synchronized updateSubscriptions on the cloudDataClient. This runs indefinitely as mqtt subscribe op was never successful. Now, mqtt callback thread is blocked at updateSubscriptions in startSyncShadows because that method is also synchronized on the cloudDataClient instance and we can't have two synchronized methods interleaving on the same instance.

Why is this change necessary: More info: When the MQTT client is created for the first time, onConnect (one-time) callbacks are run before the connectFuture is completed with the client. Only when these callbacks are completed, the connectFuture is completed.

But, in the case where Shadow manager component enters into RUNNING state before the MQTT client connection is successfully created for the first time, onConnectionResumed callback is triggered when the mqtt client is created for the first time. This callback uses subscribes to topics using mqtt client. However, in order to subscribe using the mqtt client, the connectFuture should be fully completed resulting in a deadlock situation.

The fix is to run the callback in a separate thread, so the connectFuture is completed without being blocked.

How was this change tested:

Any additional information or context required to review the change:

Checklist:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

jcosentino11 commented 1 month ago

Can we accomplish this without adding more threads? In startSyncingShadows, we can check if connected in a non-blocking way mqttClient.getMqttOnline().get(). and for stopSyncingShadows, maybe we can find a way to not waitForSyncEnd(); in this case

github-actions[bot] commented 1 month ago

Unit Tests Coverage Report

File Coverage Lines Branches
All files 83% 88% 78% :white_check_mark:

Minimum allowed coverage is 65%

Generated by :monkey: cobertura-action against e5f94b4a393755713ee76ad100b69d8793af1fe3

github-actions[bot] commented 1 month ago

Integration Tests Coverage Report

File Coverage Lines Branches
All files 72% 76% 69% :white_check_mark:

Minimum allowed coverage is 45%

Generated by :monkey: cobertura-action against e5f94b4a393755713ee76ad100b69d8793af1fe3