Closed saranyailla closed 1 month ago
Can we accomplish this without adding more threads? In startSyncingShadows, we can check if connected in a non-blocking way mqttClient.getMqttOnline().get()
. and for stopSyncingShadows, maybe we can find a way to not waitForSyncEnd();
in this case
Unit Tests Coverage Report
File | Coverage | Lines | Branches | |
---|---|---|---|---|
All files | 83% |
88% |
78% |
:white_check_mark: |
Minimum allowed coverage is 65%
Generated by :monkey: cobertura-action against e5f94b4a393755713ee76ad100b69d8793af1fe3
Integration Tests Coverage Report
File | Coverage | Lines | Branches | |
---|---|---|---|---|
All files | 72% |
76% |
69% |
:white_check_mark: |
Minimum allowed coverage is 45%
Generated by :monkey: cobertura-action against e5f94b4a393755713ee76ad100b69d8793af1fe3
Issue #, if available:
Description of changes: Run mqtt callbacks in a separate thread to avoid a deadlock situation that happens when the Shadow manager component enters into RUNNING state before the MQTT client connection is successfully created acc to GG.
Mqtt connect future will be completed with the client only after the first on connect callbacks are triggered. Shadow manager onConnect callback needs the client to be fully formed (connect future to be completed with the mqtt client) for it to use subscribe with it. Hence, the subscriptions triggered from the callback timeout waiting for the client.
During SM start up, startSyncingShadows is called which calls updateSubscriptions on the cloudDataClient. That spins up a new thread from the executor service pool which run this private synchronized updateSubscriptions on the cloudDataClient. This runs indefinitely as mqtt subscribe op was never successful. Now, mqtt callback thread is blocked at updateSubscriptions in startSyncShadows because that method is also synchronized on the cloudDataClient instance and we can't have two synchronized methods interleaving on the same instance.
Why is this change necessary: More info: When the MQTT client is created for the first time, onConnect (one-time) callbacks are run before the connectFuture is completed with the client. Only when these callbacks are completed, the
connectFuture
is completed.But, in the case where Shadow manager component enters into RUNNING state before the MQTT client connection is successfully created for the first time, onConnectionResumed callback is triggered when the mqtt client is created for the first time. This callback uses subscribes to topics using mqtt client. However, in order to subscribe using the mqtt client, the
connectFuture
should be fully completed resulting in a deadlock situation.The fix is to run the callback in a separate thread, so the
connectFuture
is completed without being blocked.How was this change tested:
Any additional information or context required to review the change:
Checklist:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.