BroadbandForum / obuspa

OB-USP-AGENT is a system daemon providing a User Services Platform (USP) Agent. https://github.com/BroadbandForum/obuspa/wiki
BSD 3-Clause "New" or "Revised" License
93 stars 62 forks source link

Add MQTT Client 2 with unreachable BrokerAddress disconnects MQTT Client Instance1 #100

Closed arunsivasankar closed 9 months ago

arunsivasankar commented 11 months ago

Hello Everyone,

Please feel to clarify and close the issue, if it is identified as not a bug, but an environmental setting issue.

Version : obuspa 7.0.0 Device.LocalAgent.MQTT.Client.1. configured with right Broker Address and connection is established Device.MQTT.Client.1.KeepAliveTime = 60 and PINGREQ/PINGRESP is being set. root@Desktop:/home/superadmin# netstat -nlap | grep obusp tcp 4 0 192.168.1.70:57978 135.xx.xx.xx:1883 ESTABLISHED 5632/obuspa

Configure second MQTT Client from controller with unreachable IP Address 1.2.3.4 and Set Device.MQTT.Client.2.Enable to true

header { msg_id: "4-1702917674484:12732250" msg_type: SET } body { request { set { allow_partial: true update_objs { obj_path: "Device.MQTT.Client.2." param_settings { param: "Enable" value: "true" required: true } param_settings { param: "BrokerAddress" value: "1.2.4.5" required: true } param_settings { param: "BrokerPort" value: "10" required: true } } } } }

MQTT client instance tries to establish connection to 1.2.3.4 and wait for SYN/ACK root@Desktop:/home/superadmin# netstat -nlap | grep obusp tcp 0 1 192.168.1.70:45748 1.2.4.5:10 SYN_SENT 5632/obuspa

During this wait period, mqtt client1 has not sent PINGREG to controller, resulting in controller disconnecting the MQTT Client1.

now both MQTT Client1 and Client2 retries to establish connection and this cycle repeats PerformMqttClientConnect: Failed to connect v5 with Operation timed out (14) HandleMqttError: Got error: 1, reason: Failed to connect to client, retry_count: 1 HandleMqttError (MoveState_Private): SendingConnect --> [[ Failed to connect to client ]] --> Error/Retrying MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 9s retry_time: 3217471 time: 3217462 LogCallback: MQTT Debug: Client MQTTUSER received PUBACK (Mid: 6, RC:0) PublishCallback: Sent MID 6 LogCallback: MQTT Debug: Client MQTTUSER sending PINGREQ MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 9s retry_time: 3217471 time: 3217462 DisconnectCallback: Disconnected (rc=7, is_reconnect=0)

MQTT_ProcessAllSocketActivity: Failed to write to socket HandleMqttError: Got error: 4, reason: Force disconnected from broker, retry_count: 0 HandleMqttError (MoveState_Private): Running --> [[ Force disconnected from broker ]] --> Error/Retrying MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 6s retry_time: 3217468 time: 3217462 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 9s retry_time: 3217471 time: 3217462

MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 5s retry_time: 3217468 time: 3217463 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 8s retry_time: 3217471 time: 3217463 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 4s retry_time: 3217468 time: 3217464 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 7s retry_time: 3217471 time: 3217464 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 3s retry_time: 3217468 time: 3217465 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 6s retry_time: 3217471 time: 3217465 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 2s retry_time: 3217468 time: 3217466 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 5s retry_time: 3217471 time: 3217466 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 1s retry_time: 3217468 time: 3217467 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 4s retry_time: 3217471 time: 3217467 MQTT_UpdateAllSockSet: Retrying connection EnableMqttClient (MoveState_Private): Error/Retrying --> [[ Starting Connection ]] --> SendingConnect MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 3s retry_time: 3217471 time: 3217468 LogCallback: MQTT Debug: Client MQTTUSER sending CONNECT Connect (MoveState_Private): SendingConnect --> [[ Connect sent ]] --> AwaitingConnect MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 2s retry_time: 3217471 time: 3217469 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 2s retry_time: 3217471 time: 3217469 LogCallback: MQTT Debug: Client MQTTUSER received CONNACK (0) ConnectV5Callback: No cert chain, so cannot get controller trust ConnectV5Callback: Received client id "MQTTUSER" ConnectV5Callback (MoveState_Private): AwaitingConnect --> [[ Connect Callback Received ]] --> Running Subscribe: change subscribe qos to 1 Subscribe: Sending subscribe to hdm/usp/endpoint/agent/proto::MQTTUSER 0 1 LogCallback: MQTT Debug: Client MQTTUSER sending SUBSCRIBE (Mid: 1, Topic: hdm/usp/endpoint/agent/proto::MQTTUSER, QoS: 1, Options: 0x00) MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 2s retry_time: 3217471 time: 3217469 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 2s retry_time: 3217471 time: 3217469

USP_CONNECT_RECORD sending at time 1970-02-07T05:44:29Z, to host 135.x.x.x over MQTT version: "1.2" to_id: "self:hdmept:controller2" from_id: "proto::MQTTUSER" payload_security: PLAINTEXT mac_signature[0] sender_cert[0] mqtt_connect { version: V5 subscribed_topic: "hdm/usp/endpoint/agent/proto::MQTTUSER" }

LogCallback: MQTT Debug: Client MQTTUSER sending PUBLISH (d0, q1, r0, m2, 'hdm/usp/endpoint/controller/notifications', ... (99 bytes)) MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 2s retry_time: 3217471 time: 3217469 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 2s retry_time: 3217471 time: 3217469 LogCallback: MQTT Debug: Client MQTTUSER received SUBACK MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 2s retry_time: 3217471 time: 3217469 LogCallback: MQTT Debug: Client MQTTUSER received PUBACK (Mid: 2, RC:0) PublishCallback: Sent MID 2 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 2s retry_time: 3217471 time: 3217469 MQTT_UpdateAllSockSet: Waiting for time to retry: remaining time: 1s retry_time: 3217471 time: 3217470 MQTT_UpdateAllSockSet: Retrying connection EnableMqttClient (MoveState_Private): Error/Retrying --> [[ Starting Connection ]] --> SendingConnect PerformMqttClientConnect: No username found

the source port of proper mqtt client keeps changing in netstat output indicating, new connection are made

Please provide your comments, if this is expected and any method to mitigate this problem by adjusting Device.MQTT.Client.1.KeepAliveTime obuspa_logs.txt

holme-r commented 10 months ago

OBUSPA calls libmosquitto's mosquitto_connect_bind_v5() function, which can block for up to 2 minutes if the server does not respond. This seems to be the cause of the issue you are seeing - the 2 minutes is longer than the keep alive on the other MQTT connection and the code uses a single thread for servicing all of the MQTT connections (to change OBUSPA to use multiple threads for servicing MQTT connections would be a very large change).

Unfortunately libmosquitto does not have a 'connect-with-timeout' function, which is what we really need here. The latest versions of libmosquitto do have non blocking connect functions (e.g. mosquitto_connect_bind_async), however the documentation says that these are only usable if you're not using your own event loop (i.e. not using mosquitto_loop_XXX set of functions). Since OBUSPA is using its own event loop, these functions cannot be used.

Whilst it would be possible to increase the keep alive timeout for the other MQTT connection, more than 2 minutes is likely to need any load balancer in the network path to also have their timeout increased.

I think the best solution would be to add a new function to libmosquitto (mosquitto_connect_set_timeout) which sets the timeout for subsequent calls to the existing connect API functions. This would be less intrusive from a libmosquitto API perspective than adding a new 'connect-with-timeout' function, and easily achieved, as connect timeout is a socket option. OBUSPA could then call mosquitto_connect_set_timoue() before calling mosquitto_connect_bind_v5().

I'm sorry that this doesn't solve your issue, but it does at least provide a way forward, albeit dependant on changes to libmosquitto being accepted.

arunsivasankar commented 10 months ago

Thank you Richard Holme for taking your time to investigate and provide a detailed explanation. The environment we have for now is for trials and supporting a second mqtt connection is only for trial run, for which we can mitigate the current problem by increasing the keepalivetime interval. Further, we will explore the feasibility to add a new function to libmosquitto library.

You may please close the issue.

Thank you once again.

holme-r commented 9 months ago

Closing at the request of Arun. We have a BBF ticket that captures this issue.