Closed NightSkySK closed 3 months ago
I've just revert changes introduced by commit https://github.com/FreeRTOS/iot-reference-esp32c3/commit/e2d407f72c20202730e7ac641506559fc36eee43
and as long as it doesn't solve MQTTKeepAliveTimeout
software is again capable of recovering MQTT connection without necessary rebooting the device
2024-07-24 12:08:17 I (343095) gate_control: Task "GateReport" sending subscribe request to coreMQTT-Agent for topic filter: /gates/GA97DF0005 with id 110
2024-07-24 12:08:24 E (350095) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (350095) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
E (350095) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout
I (350105) report: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (350105) core_mqtt_agent_manager: Stack size uxMqttAgentTask: 1896
I (350115) gate_control: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (350105) core_mqtt_agent_manager: TLS connection was disconnected.
I (350135) supervisor: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (350145) ota_over_mqtt: coreMQTT-Agent disconnected. Suspending OTA agent.
I (350155) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
2024-07-24 12:08:27 I (352415) coreMQTT: MQTT connection established with the broker.
I (352415) core_mqtt_agent_manager: Session present: 0
I (352415) core_mqtt_agent_manager: Resubscribe to the topic /gates/GA97DF0005/update will be attempted.
W (352415) gate_control: Error or timed out waiting for ack to subscribe message 110. Re-attempting subscribe.
I (352435) report: coreMQTT-Agent connected.
I (352445) gate_control: coreMQTT-Agent connected.
I (352455) supervisor: coreMQTT-Agent connected.
I (352455) gate_control: Task "GateReport" sending subscribe request to coreMQTT-Agent for topic filter: /gates/GA97DF0005 with id 110
I (352455) ota_over_mqtt: coreMQTT-Agent connected. Resuming OTA agent.
I (352475) core_mqtt_agent_manager: coreMQTT-Agent connected.
2024-07-24 12:08:29 I (355055) gate_control: Subscribe 110 for topic filter /gates/GA97DF0005 succeeded for task "GateReport".
Unfortunately without those changes also E (38628625) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
returns to be an issue:
I (38625455) gate_control: Task "GateReport" sending unsubscribe request to coreMQTT-Agent for topic filter: /gates/GA97DF0004 with id 12502
2024-07-24 22:46:00 E (38627905) coreMQTT: sendMessageVector: Unable to send packet: Network Error.
W (38627905) gate_control: Error or timed out waiting for ack to unsubscribe message 12502. Re-attempting subscribe.
E (38627905) coreMQTT: MQTT operation failed with status MQTTSendFailed
I (38627925) gate_control: Task "GateReport" sending unsubscribe request to coreMQTT-Agent for topic filter: /gates/GA97DF0004 with id 12502
I (38627925) report: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (38627925) core_mqtt_agent_manager: Stack size uxMqttAgentTask: 1708
I (38627945) gate_control: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (38627925) core_mqtt_agent_manager: TLS connection was disconnected.
I (38627965) supervisor: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (38627985) ota_over_mqtt: coreMQTT-Agent disconnected. Suspending OTA agent.
I (38627995) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
2024-07-24 22:46:02 E (38628625) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
E (38628635) esp-tls: create_ssl_handle failed
E (38628635) esp-tls: Failed to open new connection
W (38628695) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (38628695) AWS_OTA: OTA Agent is suspended.
I (38628695) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended]
I (38628945) core_mqtt_agent_manager: Retry attempt 1.
2024-07-24 22:46:02 E (38629155) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
E (38629155) esp-tls: create_ssl_handle failed
E (38629155) esp-tls: Failed to open new connection
2024-07-24 22:46:02 I (38629615) core_mqtt_agent_manager: Retry attempt 2.
2024-07-24 22:46:03 E (38629715) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
E (38629715) esp-tls: create_ssl_handle failed
E (38629715) esp-tls: Failed to open new connection
2024-07-24 22:46:04 I (38631185) core_mqtt_agent_manager: Retry attempt 3.
E (38631405) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
E (38631405) esp-tls: create_ssl_handle failed
E (38631405) esp-tls: Failed to open new connection
2024-07-24 22:46:06 I (38633155) core_mqtt_agent_manager: Retry attempt 4.
Hi @NightSkySK, Thanks for your feedback!
1. Why MQTTKeepAliveTimeout even if few seconds earlier there was successfully published msg to
AWS IoT MQTT server and from AWS IoT Documantation page I can read:
My personal guess is that the ping request is sent from device side and it's waiting for ping response. In this case, even though we've sent the publish message successfully, this scenario will also happen (refer to this line). Let me trigger an internal discussion to see if we can have better solution!
2. The application lost the capability to recover from coreMQTT-Agent disconnected
my short-term solution is to trigger the device reboot by my supervisor task, it isn't
an elegant solution and I would prefer to fix it properly.
I'm looking into this, will back later once I found something.
Thank you!
Hi @ActoryOu thank you for looking into this issue.
My personal guess is that the ping request is sent from device side and it's waiting for ping response. In this case, even though we've sent the publish message successfully, this scenario will also happen (refer to this line). Let me trigger an internal discussion to see if we can have better solution!
Maybe the MQTT_PINGRESP_TIMEOUT_MS
is too short for weaker network conditions? To investigate this issue I'm running simultaneously 6 devices, of which 3 are connected to IoT WiFI network where the router is supplied with 300Mbit/s fibre internet and there I don't see that MQTTKeepAliveTimeout was so frequently or even at all, much worse situation is with three other devices which are connected to TP-LINK Archer MR200 4G LTE where internet is taken from cellular LTE network. I haven't compared the pings in those networks, but I can imagine that LTE one has higher pings than the fibre one.
2. The application lost the capability to recover from coreMQTT-Agent disconnected
my short-term solution is to trigger the device reboot by my supervisor task, it isn't
an elegant solution and I would prefer to fix it properly.
Looks like the CORE_MQTT_AGENT_DISCONNECTED_BIT
in event group is cleared somehow, which is not expected.
Would you mind adding some logs to track that? Please help provide device log in error period. Here are some places to add logs.
Maybe the MQTT_PINGRESP_TIMEOUT_MS is too short for weaker network conditions? To investigate this issue I'm running simultaneously 6 devices, of which 3 are connected to IoT WiFI network where the router is supplied with 300Mbit/s fibre internet and there I don't see that MQTTKeepAliveTimeout was so frequently or even at all, much worse situation is with three other devices which are connected to TP-LINK Archer MR200 4G LTE where internet is taken from cellular LTE network. I haven't compared the pings in those networks, but I can imagine that LTE one has higher pings than the fibre one.
This is definitely possible. You can try enlarge it as workaround.
@ActoryOu
I've implemented all logging requested:
2. The application lost the capability to recover from coreMQTT-Agent disconnected my short-term solution is to trigger the device reboot by my supervisor task, it isn't an elegant solution and I would prefer to fix it properly.
Looks like the
CORE_MQTT_AGENT_DISCONNECTED_BIT
in event group is cleared somehow, which is not expected. Would you mind adding some logs to track that? Please help provide device log in error period. Here are some places to add logs.
- Return value of xEventGroupWaitBits
- Set disconnect bit. https://github.com/FreeRTOS/iot-reference-esp32c3/blob/main/main/networking/mqtt/core_mqtt_agent_manager.c#L533
- Return value of xEventGroupWaitBits
- Clear disconnect bit. https://github.com/FreeRTOS/iot-reference-esp32c3/blob/main/main/networking/mqtt/core_mqtt_agent_manager.c#L772
- Return value of xEventGroupWaitBits
As in normal condition each while
loop iteration on L777 it was printing:
W (167796) TAG: L777 xEventGroupWaitBits: 5
I've made such condition in code:
if( eMqttRet == MQTTSuccess )
{
while( 1 )
{
xReturnedEventBits_2 = xEventGroupWaitBits(
xNetworkEventGroup,
CORE_MQTT_AGENT_DISCONNECTED_BIT,
pdFALSE,
pdFALSE,
0 );
if( xReturnedEventBits_2 != 5 )
{
ESP_LOGW( "TAG",
"L777 xEventGroupWaitBits: %lx",
xReturnedEventBits_2 );
}
if( xReturnedEventBits_2 == CORE_MQTT_AGENT_DISCONNECTED_BIT )
{
break;
}
fd_set readSet;
fd_set errorSet;
And here are the logs when issue appear:
2024-07-25 12:57:06 I (3959356) gate_control: Unsubscribe 1777 for topic filter /gates/GA97DF0006 succeeded for task "GateReport".
I (3959356) gate_control: Task "GateReport" iteration 591 completed.
2024-07-25 12:57:10 I (3962626) gate_control: Task "GateReport" sending subscribe request to coreMQTT-Agent for topic filter: /gates/GA97DF0006 with id 1778
2024-07-25 12:57:15 E (3967626) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (3967626) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
E (3967626) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout
I (3967636) report: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (3967636) core_mqtt_agent_manager: L533 Set disconnect bit.
I (3967646) gate_control: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (3967666) supervisor: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (3967656) core_mqtt_agent_manager: Stack size uxMqttAgentTask: 1740
I (3967676) ota_over_mqtt: coreMQTT-Agent disconnected. Suspending OTA agent.
W (3967646) TAG: L777 xEventGroupWaitBits: 9
I (3967686) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
2024-07-25 12:57:15 W (3967766) TAG: L777 xEventGroupWaitBits: 9
W (3967836) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:57:15 W (3967906) TAG: L777 xEventGroupWaitBits: 9
W (3967976) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:57:15 W (3968046) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:57:15 W (3968116) TAG: L777 xEventGroupWaitBits: 9
W (3968146) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (3968146) AWS_OTA: OTA Agent is suspended.
I (3968146) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended]
W (3968186) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:57:15 W (3968256) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:57:15 W (3968326) TAG: L777 xEventGroupWaitBits: 9
W (3968396) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:57:20 I (3973536) gate_control: Task "GateControl" iteration 65 completed.
2024-07-25 12:57:25 W (3978446) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:57:35 W (3988496) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:57:45 W (3998546) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:57:56 W (4008596) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:58:06 W (4018646) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:58:16 W (4028696) TAG: L777 xEventGroupWaitBits: 9
2024-07-25 12:58:17 E (4030026) supervisor: GateControl task i
2024-07-25 12:58:17 s not running
So even if xEventGroupWaitBits: 5 changed to xEventGroupWaitBits: 9 the condition
if( xReturnedEventBits_2 == CORE_MQTT_AGENT_DISCONNECTED_BIT )
{
break;
}
didn't break the loop which seems to be reasonable as we have defined:
#define WIFI_CONNECTED_BIT ( 1 << 0 )
#define WIFI_DISCONNECTED_BIT ( 1 << 1 )
#define CORE_MQTT_AGENT_CONNECTED_BIT ( 1 << 2 )
#define CORE_MQTT_AGENT_DISCONNECTED_BIT ( 1 << 3 )
So 5
mean WIFI_CONNECTED (1) and CORE_MQTT_AGENT_CONNECTED (4), isn't it?
Than flipping bits 2 & 3:
xEventGroupClearBits( xNetworkEventGroup,
CORE_MQTT_AGENT_CONNECTED_BIT );
xEventGroupSetBits( xNetworkEventGroup,
CORE_MQTT_AGENT_DISCONNECTED_BIT );
we get 9 as CORE_MQTT_AGENT_DISCONNECTED_BIT (8) and WIFI_CONNECTED_BIT(1) is still enabled. This mean that the if condition to beak the loop should be something like that:
if( ( xReturnedEventBits_2 &
CORE_MQTT_AGENT_DISCONNECTED_BIT ) == 8 )
{
break;
}
or
if( ( xReturnedEventBits_2 &
CORE_MQTT_AGENT_DISCONNECTED_BIT ) != 0 )
{
break;
}
than L777 should look something more like this:
while( (xEventGroupWaitBits( xNetworkEventGroup, CORE_MQTT_AGENT_DISCONNECTED_BIT, pdFALSE, pdFALSE, 0 )& CORE_MQTT_AGENT_DISCONNECTED_BIT) != 0 )
I'm not very experienced with bitwise operations, please correct me if I'm wrong.
Once I implemented this change:
if( ( xReturnedEventBits_2 & CORE_MQTT_AGENT_DISCONNECTED_BIT ) != 0 ) { break; }
Now it looks much better:
2024-07-25 17:54:04 I (13997266) gate_control: Task "GateControl" iteration 232 completed.
2024-07-25 17:54:04 I (13997786) coreMQTT: Publishing message to /gates/GA97DF0005.
2024-07-25 17:54:11 E (14004786) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (14004786) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
E (14004786) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout
I (14004796) report: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (14004796) core_mqtt_agent_manager: L533 Set disconnect bit.
I (14004806) gate_control: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (14004816) core_mqtt_agent_manager: Stack size uxMqttAgentTask: 1880
I (14004826) supervisor: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (14004816) core_mqtt_agent_manager: L777 xEventGroupWaitBits: 9
I (14004846) ota_over_mqtt: coreMQTT-Agent disconnected. Suspending OTA agent.
W (14004846) core_mqtt_agent_manager: L712 xEventGroupWaitBits: 9
I (14004856) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
I (14004866) core_mqtt_agent_manager: TLS connection was disconnected.
2024-07-25 17:54:11 W (14004976) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (14004976) AWS_OTA: OTA Agent is suspended.
I (14004976) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended]
2024-07-25 17:54:14 I (14007516) coreMQTT: MQTT connection established with the broker.
I (14007526) core_mqtt_agent_manager: Session present: 0
I (14007526) core_mqtt_agent_manager: Resubscribe to the topic /gates/GA97DF0005/update will be attempted.
W (14007526) gate_control: Error or timed out waiting for ack for publish message 6271. Re-attempting publish.
I (14007536) core_mqtt_agent_manager: Resubscribe to the topic /gates/GA97DF0005 will be attempted.
W (14007556) core_mqtt_agent_manager: L772 Clear disconnect bit
I (14007556) report: coreMQTT-Agent connected.
W (14007556) core_mqtt_agent_manager: L509 xEventGroupWaitBits: 5
I (14007566) gate_control: coreMQTT-Agent connected.
I (14007576) supervisor: coreMQTT-Agent connected.
I (14007586) ota_over_mqtt: coreMQTT-Agent connected. Resuming OTA agent.
I (14007576) gate_control: Task "GateReport" sending publish request to coreMQTT-Agent with message "{"openReceived": 0,"gateSensors":{ "IN1": 0, "IN2": 0},"keyboard":{ "new_key": 0, "key": ""}, "iteration": 2088}" on topic "/gates/GA97DF0005" with ID 6271.
I (14007596) core_mqtt_agent_manager: coreMQTT-Agent connected.
I (14007616) gate_control: Task "GateReport" waiting for publish 6271 to complete.
2024-07-25 17:54:14 I (14007976) AWS_OTA: otaPal_GetPlatformImageState
I (14007976) esp_ota_ops: aws_esp_ota_get_boot_flags: 1
I (14007976) esp_ota_ops: [0] aflags/seq:0x2/0x1, pflags/seq:0xffffffff/0x0
I (14007976) AWS_OTA: Current State=[RequestingJob], Event=[Resume], New state=[RequestingJob]
2024-07-25 17:54:16 I (14009036) coreMQTT: Publishing message to /gates/GA97DF0005.
Maybe the MQTT_PINGRESP_TIMEOUT_MS is too short for weaker network conditions? To investigate this issue I'm running simultaneously 6 devices, of which 3 are connected to IoT WiFI network where the router is supplied with 300Mbit/s fibre internet and there I don't see that MQTTKeepAliveTimeout was so frequently or even at all, much worse situation is with three other devices which are connected to TP-LINK Archer MR200 4G LTE where internet is taken from cellular LTE network. I haven't compared the pings in those networks, but I can imagine that LTE one has higher pings than the fibre one.
This is definitely possible. You can try enlarge it as workaround.
Can I ask MQTT cloud reconnect itself when wifi reconnect?In my test I foud that when wifi disconnect and reconnect itself,MQTT client can not reconnect itself. here are some log.
I (36611157) wifi:bcn_timeout,ap_probe_send_start [0;32mI (36611317) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;31mE (36612037) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout[0m [0;31mE (36612037) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout[0m [0;31mE (36612037) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout [0m [0;32mI (36612037) core_mqtt_agent_manager: coreMQTT-Agent disconnected.[0m [0;32mI (36612037) sub_pub_unsub_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.[0m [0;32mI (36612037) ota_over_mqtt_demo: coreMQTT-Agent disconnected. Suspending OTA agent.[0m [0;32mI (36612037) temp_sub_pub_and_led_control_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.[0m [0;33mW (36612317) AWS_OTA: OTA Timer handle NULL for Timerid=0, can't stop.[0m [0;33mW (36612317) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.[0m [0;32mI (36612317) AWS_OTA: OTA Agent is suspended.[0m [0;32mI (36612317) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended][0m [0;32mI (36612317) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (36613317) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m I (36613657) wifi:ap_probe_send over, resett wifi status to disassoc I (36613657) wifi:state: run -> init (c800) I (36613657) wifi:pm stop, total sleep time: 31072080977 us / 36610841828 us I (36613657) wifi:idx:1, tid:0 I (36613657) wifi:idx:0, tid:4 I (36613657) wifi:new:<6,0>, old:<6,0>, ap:<255,255>, sta:<6,0>, prof:1 [0;32mI (36613667) core_mqtt_agent_manager: WiFi disconnected.[0m [0;32mI (36613667) app_wifi: Disconnected. Connecting to the AP again...[0m [0;32mI (36614317) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (36615317) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m I (36616077) wifi:new:<1,0>, old:<6,0>, ap:<255,255>, sta:<1,0>, prof:1 I (36616077) wifi:state: init -> auth (b0) I (36616077) wifi:state: auth -> assoc (0) I (36616107) wifi:state: assoc -> run (10) I (36616117) wifi:connected with test123, aid = 17, channel 1, BW20, bssid = f6:84:8d:1b:33:62 I (36616117) wifi:security: WPA2-PSK, phy: bgn, rssi: -67 I (36616137) wifi:pm start, type: 1 I (36616137) wifi:set rx beacon pti, rx_bcn_pti: 14, bcn_timeout: 25000, mt_pti: 14, mt_time: 10000 I (36616167) wifi:AP's beacon interval = 102400 us, DTIM period = 1 I (36616177) wifi:idx:0 (ifx:0, f6:84:8d:1b:33:62), tid:4, ssn:2, winSize:64 [0;32mI (36616317) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (36617137) core_mqtt_agent_manager: WiFi connected.[0m [0;32mI (36617137) app_wifi: Connected with IP Address:192.168.2.158[0m [0;32mI (36617137) esp_netif_handlers: sta ip: 192.168.2.158, mask: 255.255.255.0, gw: 192.168.2.1[0m [0;32mI (36617317) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;31mE (36713767) coreMQTT: No command structure available.[0m [0;32mI (36714317) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m
@JasonYan324 it all depends which version of main/networking/mqtt/core_mqtt_agent_manager.c
file you are using, If your version is up to date with main branch than it's a case which we investigate here.
My proposal for fix is make change in L777 from xEventGroupWaitBits to following code:
while( (xEventGroupWaitBits( xNetworkEventGroup, CORE_MQTT_AGENT_DISCONNECTED_BIT, pdFALSE, pdFALSE, 0 )& CORE_MQTT_AGENT_DISCONNECTED_BIT) != 0 )
There is quite big chance that it will help with your case as well. Please let us know if that works for you as I'm about to prepare the Pull Request with that solution
it all depends which version of
main/networking/mqtt/core_mqtt_agent_manager.c
file you are using, If your version is up to date with main branch than it's a case which we investigate here. My proposal for fix is make change in L777 from xEventGroupWaitBits to following code:
my SDK version is 5.2.2, not the new. Today, I saw the SDK is updtate to 5.3.
one more thing, even if wifi does not diconnect, MQTT keep alive failed and disconnect, but can not be reconnect itselt.
here are some log:
[0;32mI (5357267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5357297) sub_pub_unsub_demo: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /Meshed/Jason/404CCA89C184/Filter/SubPub0 with id 247[0m [0;32mI (5358267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5358837) coreMQTT: Ack packet deserialized with result: MQTTSuccess.[0m [0;32mI (5358847) coreMQTT: State record updated. New state=MQTTPublishDone.[0m [0;32mI (5359267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5360267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5361267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5362267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5363267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;31mE (5363847) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout[0m [0;31mE (5363847) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout[0m [0;31mE (5363847) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout [0m [0;32mI (5363847) core_mqtt_agent_manager: coreMQTT-Agent disconnected.[0m [0;32mI (5363847) sub_pub_unsub_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.[0m [0;32mI (5363847) ota_over_mqtt_demo: coreMQTT-Agent disconnected. Suspending OTA agent.[0m [0;32mI (5363847) temp_sub_pub_and_led_control_demo: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.[0m [0;33mW (5364267) AWS_OTA: OTA Timer handle NULL for Timerid=0, can't stop.[0m [0;33mW (5364267) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.[0m [0;32mI (5364267) AWS_OTA: OTA Agent is suspended.[0m [0;32mI (5364267) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended][0m [0;32mI (5452267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5453267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5454267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5455267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5456267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5457267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5458267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5459267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;31mE (5460097) coreMQTT: No command structure available.[0m [0;32mI (5478267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5479267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;31mE (5480117) coreMQTT: No command structure available.[0m [0;32mI (5487267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5488267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5489267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;31mE (5490127) coreMQTT: No command structure available.[0m [0;32mI (5498267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5499267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;31mE (5500137) coreMQTT: No command structure available.[0m [0;32mI (5509267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;31mE (5510147) coreMQTT: No command structure available.[0m [0;32mI (5510267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m [0;32mI (5511267) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0[0m
I see the solution is merged with this commit https://github.com/FreeRTOS/iot-reference-esp32c3/commit/36492fc017089ff3f03f9513374f842948fdfdf1
I see the solution is merged with this commit 36492fc
It seems, the commit 36492fc have a lot of problems
as I mentioned in first e-mail I've applied changes from following commits:
https://github.com/FreeRTOS/iot-reference-esp32c3/commit/a0fe220e48d274e4b724d50109f45552fa98a58d referring to sdkconfig.default
and https://github.com/FreeRTOS/iot-reference-esp32c3/commit/e2d407f72c20202730e7ac641506559fc36eee43 referring to main/networking/mqtt/core_mqtt_agent_manager.c
mainly to fix esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
which was clear indication mbedtls memory leak
.
In the mean time we found that coreMQTT can't reestablish a connection after error (e.g. due MQTTKeepAliveTimeout)
which was solved with 36492fc .
How ever after all those changes I find out that esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
still appear again after about 11 hours running the program.
2024-07-29 08:24:04 E (39396617) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (39396617) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
E (39396617) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout
I (39396627) report: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (39396627) core_mqtt_agent_manager: Stack size uxMqttAgentTask: 1816
I (39396637) gate_control: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (39396627) core_mqtt_agent_manager: TLS connection was disconnected.
I (39396657) supervisor: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
W (39396667) core_mqtt_agent_manager: Internal heap size: 57260bytes
I (39396677) ota_over_mqtt: coreMQTT-Agent disconnected. Suspending OTA agent.
W (39396677) core_mqtt_agent_manager: Minimum free heap size: 11924 bytes
I (39396687) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
W (39396737) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (39396737) AWS_OTA: OTA Agent is suspended.
I (39396737) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended]
2024-07-29 08:24:04 E (39397197) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7f00
E (39397197) esp-tls: create_ssl_handle failed
E (39397197) esp-tls: Failed to open new connection
2024-07-29 08:24:04 I (39397517) core_mqtt_agent_manager: Retry attempt 1.
2024-07-29 08:24:05 E (39397707) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7f00
E (39397717) esp-tls: create_ssl_handle failed
E (39397717) esp-tls: Failed to open new connection
I (39398167) core_mqtt_agent_manager: Retry attempt 2.
2024-07-29 08:24:05 E (39398427) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7f00
E (39398427) esp-tls: create_ssl_handle failed
E (39398427) esp-tls: Failed to open new connection
2024-07-29 08:24:07 I (39399907) core_mqtt_agent_manager: Retry attempt 3.
2024-07-29 08:24:07 E (39400077) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7f00
E (39400077) esp-tls: create_ssl_handle failed
E (39400077) esp-tls: Failed to open new connection
2024-07-29 08:24:09 I (39401837) core_mqtt_agent_manager: Retry attempt 4.
E (39402017) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7f00
E (39402017) esp-tls: create_ssl_handle failed
E (39402017) esp-tls: Failed to open new connection
2024-07-29 08:24:11 I (39404217) core_mqtt_agent_manager: Retry attempt 5.
2024-07-29 08:24:11 E (39404387) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7f00
E (39404387) esp-tls: create_ssl_handle failed
E (39404387) esp-tls: Failed to open new connection
2024-07-29 08:24:16 I (39409227) core_mqtt_agent_manager: Retry attempt 6.
2024-07-29 08:24:16 E (39409417) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7f00
E (39409427) esp-tls: create_ssl_handle failed
E (39409427) esp-tls: Failed to open new connection
2024-07-29 08:24:17 I (39410127) core_mqtt_agent_manager: Retry attempt 7.
2024-07-29 08:24:17 E (39410307) esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7f00
E (39410307) esp-tls: create_ssl_handle failed
E (39410307) esp-tls: Failed to open new connection
2024-07-29 08:24:17 E (39410527) supervisor: Connection can't be established
Could you please advice any more steps to debug reasons of memory leak causing finally to esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7f00
?
Hello @NightSkySK,
From your log, you've added extra info like core_mqtt_agent_manager: Internal heap size: 57260bytes
and Minimum free heap size: 11924 bytes
. Could you help attach the entire log? I'd like to narrow down the scenario.
Thank you.
In the meantime, I will try to perform a code review to see if there are any clues. BTW, what change did you do? Could you help share your testing branch?
Hi @NightSkySK, I'm still having trouble identifying the potential cause of the heap memory leakage based on the information provided. Could you please share the demo configuration you're using? Have you made any modifications to the demo, or are you using the demo with only configuration changes? That additional context would be helpful in assisting you with this issue.
Thank you for your cooperation.
@ActoryOu This week I'm travelling and not able to provide more evidence. I need to include more logs like Minimum free heap size:
into the code to trace it in the time of code running, so far I have it only in this one place. I can invite you to my repo however it's already quite heavily modified to adjust to my needs.
Hello I face the same issues with my ESP32-S3
System information Hardware board: ESP32-S3 (ESP32 S3 DevKitC 1 N16R8 ESP32 S3 WROOM1 N16R8) ESP-IDF: 5.2.2 ESP-IDE 3.0.0 Operating System: Windows
I (232435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0
I (233435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0
I (234435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0
I (235435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0
I (236115) main: Send TOPIC: ESP32S3_TEST1/out/memcheck MESSAGE: {"xxxxx"}
I (236135) MQTT_UTILS: Sending publish request Topic: "ESP32S3_TEST1/out/memcheck": "{"xxxx"}" with ID 24.
I (236155) coreMQTT: Publishing message to ESP32S3_TEST1/out/memcheck.
I (236165) MQTT_UTILS: Task "APPTASK" waiting for publish 24 to complete.
I (236175) MQTT_UTILS: Publish 24 succeeded for task "APPTASK".
I (236435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0
E (237165) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (237165) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
E (237165) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout
I (237175) ota_over_mqtt_demo: coreMQTT-Agent disconnected. Suspending OTA agent.
I (237185) core_mqtt_agent_manager: coreMQTT-Agent disconnected.
I (237185) MQTT_UTILS: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued.
I (237435) ota_over_mqtt_demo: Call OTA_Suspend...
W (237435) AWS_OTA: OTA Timer handle NULL for Timerid=0, can't stop.
W (237435) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (237435) AWS_OTA: OTA Agent is suspended.
I (237445) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended]
I (237445) ota_over_mqtt_demo: Call OTA_Suspend...OK
I (237455) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0
I (238465) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0
I (239465) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0
Hello I face the same issues with my ESP32-S3
System information Hardware board: ESP32-S3 (ESP32 S3 DevKitC 1 N16R8 ESP32 S3 WROOM1 N16R8) ESP-IDF: 5.2.2 ESP-IDE 3.0.0 Operating System: Windows
I (232435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0 I (233435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0 I (234435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0 I (235435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0 I (236115) main: Send TOPIC: ESP32S3_TEST1/out/memcheck MESSAGE: {"xxxxx"} I (236135) MQTT_UTILS: Sending publish request Topic: "ESP32S3_TEST1/out/memcheck": "{"xxxx"}" with ID 24. I (236155) coreMQTT: Publishing message to ESP32S3_TEST1/out/memcheck. I (236165) MQTT_UTILS: Task "APPTASK" waiting for publish 24 to complete. I (236175) MQTT_UTILS: Publish 24 succeeded for task "APPTASK". I (236435) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0 E (237165) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout E (237165) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout E (237165) coreMQTT: MQTT operation failed with status MQTTKeepAliveTimeout I (237175) ota_over_mqtt_demo: coreMQTT-Agent disconnected. Suspending OTA agent. I (237185) core_mqtt_agent_manager: coreMQTT-Agent disconnected. I (237185) MQTT_UTILS: coreMQTT-Agent disconnected. Preventing coreMQTT-Agent commands from being enqueued. I (237435) ota_over_mqtt_demo: Call OTA_Suspend... W (237435) AWS_OTA: OTA Timer handle NULL for Timerid=0, can't stop. W (237435) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop. I (237435) AWS_OTA: OTA Agent is suspended. I (237445) AWS_OTA: Current State=[Suspended], Event=[Suspend], New state=[Suspended] I (237445) ota_over_mqtt_demo: Call OTA_Suspend...OK I (237455) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0 I (238465) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0 I (239465) ota_over_mqtt_demo: Received: 0 Queued: 0 Processed: 0 Dropped: 0
you can update the git source code,them fix this issue, but “keep alive failed” will still happens,what is new is that coreMQTT-Agent can be reconnect itself. But, even worse: in my test ota_over_mqtt_demo can not run, it will cause reboot, I have to close it for test up to now. I still waiting for them to feedback
Hello, update source code and as you already said
"Handling of keep alive failed. Status=MQTTKeepAliveTimeout" still happen but now seem that ESP32-s3 reset as you can see in the firts lines of logs here below:
I (1099585) main: Send TOPIC: ESP32S3_TEST1/out/memcheck MESSAGE: {"xxxxxx"}
I (1099605) MQTT_UTILS: Sending publish request Topic: "ESP32S3_TEST1/out/memcheck": "xxxxxx" with ID 139.
I (1099635) MQTT_UTILS: Task "APPTASK" waiting for publish 139 to complete.
I (1101275) MQTT_UTILS: Publish 139 succeeded for task "APPTASK".
E (1107325) coreMQTT: Handling of keep alive failed. Status=MQTTKeepAliveTimeout
E (1107325) coreMQTT: Call to receiveSingleIteration failed. Status=MQTTKeepAliveTimeout
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x8 (TG1WDT_SYS_RST),boot:0x8 (SPI_FAST_FLASH_BOOT)
Saved PC:0x42004d7b
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3810,len:0x11b4
load:0x403c9700,len:0x4
load:0x403c9704,len:0xab4
load:0x403cc700,len:0x2c14
entry 0x403c9890
I (261) cpu_start: Multicore app
I (271) cpu_start: Pro cpu start user code
I (271) cpu_start: cpu freq: 160000000 Hz
I (271) cpu_start: Application information:
I (274) cpu_start: Project name: smartmeter
I (279) cpu_start: App version: v202212.00-64-g06f5ce1-dirty
I (286) cpu_start: Compile time: Aug 1 2024 08:04:32
I (292) cpu_start: ELF file SHA256: 57fe881d4...
I (297) cpu_start: ESP-IDF: v5.2.2-dirty
I (303) cpu_start: Min chip rev: v0.0
I (307) cpu_start: Max chip rev: v0.99
I (312) cpu_start: Chip rev: v0.2
I (317) heap_init: Initializing. RAM available for dynamic allocation:
I (324) heap_init: At 3FCABFE8 len 0003D728 (245 KiB): RAM
I (330) heap_init: At 3FCE9710 len 00005724 (21 KiB): RAM
I (336) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (343) heap_init: At 600FE014 len 00001FD4 (7 KiB): RTCRAM
I (350) spi_flash: detected chip: generic
I (354) spi_flash: flash io: dio
W (357) spi_flash: Detected size(16384k) larger than the size in the binary image header(4096k). Using the size in the binary image header.
W (371) mcpwm(legacy): legacy driver is deprecated, please migrate to `driver/mcpwm_prelude.h`
I (380) sleep: Configure to isolate all GPIO pins in sleep state
I (387) sleep: Enable automatic switching of GPIO sleep configuration
I (394) coexist: coex firmware version: d96c1e51f
I (399) coexist: coexist rom version e7ae62f
I (405) main_task: Started on CPU0
I (415) main_task: Calling app_main()
I (415) main: ------ APPLICATION START ----------
I (415) main:
EWC_S3 WFR ESP-IDF Version = v5.2.2-dirty
I (425) main: Get Device Credentil From FLASH Initial partition. OK!
I (435) main: xDeviceInitialSettings:
I (435) main: cDeviceSerial[sizeDeviceSerial]=ESP32S3_TEST1
I (445) main: cUdpPin[sizeUdpPin]=1234
I (445) main: cXorKey[sizeXorKey]=DEMO_KEY
I (455) main: cThingName[sizeThingName]=ESP32S3_TEST1
I (455) main: cCertificate[sizeCertificate]=-----BEGIN CERTIFICATE-----
...
and secondly, already as you said: ota_over_mqtt_demo can not run, : with "with error code 3."
I (2195) ota_over_mqtt_demo: OTA over MQTT demo, Application version 0.0.0
I (2205) main: START APPLICATION EWC
I (2205) main: ------ APPLICATION START ----------
...
I (9485) core_mqtt_agent_manager: TLS connection established.
I (10095) coreMQTT: MQTT connection established with the broker.
I (10095) core_mqtt_agent_manager: Session present: 0
I (10095) ota_over_mqtt_demo: coreMQTT-Agent connected. Resuming OTA agent.
I (10105) core_mqtt_agent_manager: coreMQTT-Agent connected.
I (10105) MQTT_UTILS: coreMQTT-Agent connected.
I (10185) ota_over_mqtt_demo: Request Job Document event Received
E (10185) ota_over_mqtt_demo: Failed to write job start next topic to buffer with error code 3.
Hello, just un update, after I have excluded the ota_over_mqtt_demo from my code and all seem work without any error from the 16 hours, without any event of MQTTKeepAliveTimeout.
Hi @WilliamFrasson, We recently fixed an issue on OTA demo via #103. Could you help pull latest version and test if same issue exists?
Thank you.
Hi @ActoryOu I enabled the ota demo test with the ota fix, and now running .. after 4 hours i don't get MQTTKeepAliveTimeout events.
Hi @JasonYan324 Since you were also facing a similar issue , can you confirm if the https://github.com/FreeRTOS/iot-reference-esp32c3/pull/103 change works for you as well?
Hi @JasonYan324 Since you were also facing a similar issue , can you confirm if the #103 change works for you as well?
Thanks,I have already update for test, and gonging well
I'm closing this because the patch works for multiple people. @NightSkySK, please feel free to create a new issue to describe your current status if the patch doesn't work.
Thank you.
Describe the bug
At first glance, the subject can suggest that it is the same issue as described in https://github.com/FreeRTOS/iot-reference-esp32c3/issues/47 and https://github.com/FreeRTOS/iot-reference-esp32c3/issues/48 However, as the symptoms are the same the solution from those issues haven't solve my problem. Let me explain step by step what happened and my observations and logs (It will be quite a long thread, I'm sorry...)
The story begins when I found that my code is suffering from mbedtls memory leak: In logs I found
esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
which was clear indicationmbedtls memory leak
In the same software version I noticed that
MQTTKeepAliveTimeout
appear from time to time but the software can handle reconnection without issue:So to fix mbedtls memory issue I've applied following commits: https://github.com/FreeRTOS/iot-reference-esp32c3/commit/a0fe220e48d274e4b724d50109f45552fa98a58d referring to
sdkconfig.default
and https://github.com/FreeRTOS/iot-reference-esp32c3/commit/e2d407f72c20202730e7ac641506559fc36eee43 referring tomain/networking/mqtt/core_mqtt_agent_manager.c
And it worked well for mbedtls memory issue, I can't find any evidence
esp-tls-mbedtls: mbedtls_ssl_setup returned -0x7F00
anymore, however the frequency ofMQTTKeepAliveTimeout
significantly increased and software is no longer capable to recover MQTT connection.And on this stage we have two issues:
MQTTKeepAliveTimeout
even if few seconds earlier there was successfully published msg to AWS IoT MQTT server and from AWS IoT Documantation page I can read:where key information is
This timer is reset whenever AWS IoT receives a PUBLISH, SUBSCRIBE, PING, or PUBACK message from the client.
I've also tried to increase in
sdkconfig.default
fromCONFIG_GRI_MQTT_AGENT_KEEP_ALIVE_INTERVAL_SECONDS=60
toCONFIG_GRI_MQTT_AGENT_KEEP_ALIVE_INTERVAL_SECONDS=600
without any major differencewhich I found confirmation at the server side as Livecycle Connect/Disconnect events don't show
MQTT_KEEP_ALIVE_TIMEOUT
butDUPLICATE_CLIENTID
caused by rebooting ESP32 without proper disconnection with MQTT server.coreMQTT-Agent disconnected
my short-term solution is to trigger the device reboot by my supervisor task, it isn't an elegant solution and I would prefer to fix it properly.System information