BroadbandForum / obuspa

OB-USP-AGENT is a system daemon providing a User Services Platform (USP) Agent. https://github.com/BroadbandForum/obuspa/wiki
BSD 3-Clause "New" or "Revised" License
93 stars 62 forks source link

main thread of obuspa will be blocked about 126 seconds #31

Closed dandan626 closed 2 years ago

dandan626 commented 3 years ago

Hello everyone,

When the main thread of obuspa calls the HandleUspMessage()->MTP_EXEC_ActivateScheduledActions(), it will be stuck almost 126 seconds. See the log below - image

The main thread can't get the mqtt_access_mutex, while the MQTT thread hold the lock in the call stack: MTP_EXEC_MqttMain->MQTT_ProcessAllSocketActivity

The MQTT_ProcessAllSocketActivity holds the mqtt_access_mutex for so much time that the main thread can get the lock mqtt_access_mutex.

holme-r commented 3 years ago

MQTT_ProcessAllSocketActivity() isn't supposed to hold the access mutex for that long. What does the callstack show that MQTT_ProcessAllSocketActivity is doing during the 126 seconds ? I am unable to reproduce this issue with libmosquitto 1.6.15 when the MQTT server is off line. Are there any other conditions necessary to trigger the issue ?

dandan626 commented 3 years ago

When the MQTT broker is not available, and the USP agent MQTT thread is try to connect it. This issue will happen.

holme-r commented 3 years ago

That's what I tried, but was unable to reproduce. Which version of libmosquitto are you using ? And what options did you use when building it ? Also please supply a callstack trace showing what MQTT_ProcessAllSocketActivity is doing during the 126 seconds.

dandan626 commented 3 years ago

My libmosquitto version is "2.0.2". I will reproduce it and catch the call stack for you tomorrow.

dandan626 commented 3 years ago

I think in the MQTT lib 2.0.2, the internal of mosquitto_connect_bind_v5 function has a 120 seconds timeout. So in the call stack below MTP_EXEC_MqttMain()->MQTT_ProcessAllSocketActivity()->MQTT_ProcessAllSocketActivity()->Connect()->ConnectV5()->mosquitto_connect_bind_v5() the mutex mqtt_access_mutex will be hold all through the function MQTT_ProcessAllSocketActivity for 120s more.

See my test steps: In obuspa release version "4.0.0" 1, Add debug log code image 2, reproduce and get the log: image

The network topo: ONT directly connects to a CDRouter device. The CDRouter disables the MQTT service. So the 1883 port is not in service of CDRouter. ONT(with obuspa running insde) <----->switch<------> CDRoute()

The tcp connection between ONT and CDRouter. The 1883 port of CDRouter is not responding. image

holme-r commented 3 years ago

Thank you for the investigation. I can see what the problem is now.

holme-r commented 2 years ago

Closing, as fixed in release 5.0.0.