Closed grdSTM closed 7 months ago
Hi, we're looking into this. Thanks for bringing this up!
Hello @grdSTM,
Thanks a lot for taking the time to report the bug to us. We appreciate it :)
Trying to follow along - let me know if I miss something or did not understand correctly.
In case of clean session (After the 1st broker disconnect-reconnect event), the mutex is relinquished here. After which the agent does its job in the command loop here.
After this, the broker is disconnected from the client and we reacquire the mutex here.
Following this, we restart from top of the loop where MQTT_Connect
is called. This time however, the clean session flag is set to false and we call prvHandleResubscribe
here. Which does NOT try to take the mutex and just relinquishes it in case of a failure.
I am not sure that I am seeing a deadlock here. Can you help e find my mistake? Did I miss something or overlook some case?
Thank you for reporting the bug to us.
Thanks, Aniruddha
Hello @AniruddhaKanhere,
sorry to jump abruptly into the discussion, I had stumbled in the deadlock issue some time ago and notified the issue to @grdSTM. Referring to your comment, what i have seen is that after the 2nd disconnect-reconnect, the session flag is set to true because the session is persistent. That means that prvHandleResubscribe() is not called.
The temporary solution we adopted is to explicitly handle the case ( xSessionPresent == true) by unlocking the mutex, see the patch made over the v202205.00 version:
The above PR was merged. Thank you for the contribution @grdSTM and @gmarcolinosr!
I shall be closing Issue now.
A user investigating the vMQTTAgentTask behavior upon MQTT broker disconnection faced a deadlock in xLockSubCtx().
See the patch they made over the v202205.00 version: https://github.com/FreeRTOS/iot-reference-stm32u5/compare/main...grdSTM:lab-iot-reference-stm32u5:resubDlWa?expand=1
As far as I understand, the house-keeping of subscription locks is intended to be handled by MQTTAgent_CancelAll() callbacks.
Edit: The issue has been detected in a port relying on an alternative implementation of the MQTT transport interface. Compared to with the Wi-Fi driver of the iot-reference-stm32u5 project, it is possible that pending TCP segments are handled differently when the TCP connection gets closed.
Is the above patch the proper fix, or are there preferred alternatives?
Thanks!