FreeRTOS / coreMQTT-Agent

Implements an MQTT agent (or daemon) task for simple MQTT connection sharing among different threads of execution.
MIT License
29 stars 32 forks source link

Core 1 panic'ed (LoadProhibited) in MQTTAgentCommand_Publish #104

Closed NightSkySK closed 8 months ago

NightSkySK commented 9 months ago

The issue was created in (https://github.com/FreeRTOS/iot-reference-esp32c3/issues/63), but I think this in fact should land here. My code is based on that repo with slight modifications to make loops indefinitely.

I found that after number of loop iteration ESP32 crash with following logs:

I (458943) coreMQTT: State record updated. New state=MQTTPubAckSend.
W (458943) sub_pub_unsub_demo: Error or timed out waiting for ack for publish message 119. Re-attempting publish.
I (458953) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (458963) sub_pub_unsub_demo: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 119.   
I (458963) coreMQTT: State record updated. New state=MQTTPublishDone.
I (458983) sub_pub_unsub_demo: Task "SubPub0" waiting for publish 119 to complete.
I (458983) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (459003) sub_pub_unsub_demo: Publish 119 succeeded for task "SubPub0".
I (459003) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (459023) sub_pub_unsub_demo: Task "SubPub0" received: SubPub0
I (459023) sub_pub_unsub_demo: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 120
I (459313) ota_over_mqtt: Subscribed to topic $aws/things/GA913F0001/jobs/notify-next.

I (459313) AWS_OTA: Subscribed to MQTT topic: $aws/things/GA913F0001/jobs/notify-next
W (459923) ota_over_mqtt: Stack size uxOTATask: 888

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x4000c250  PS      : 0x00060730  A0      : 0x8018ae5c  A1      : 0x3ffd75e0
A2      : 0x00000001  A3      : 0x00000000  A4      : 0x00007869  A5      : 0x3f42b34c  
A6      : 0x00000008  A7      : 0xff000000  A8      : 0x00000000  A9      : 0x3ffd75b0
A10     : 0x00000000  A11     : 0x3f408be0  A12     : 0x3ffd7834  A13     : 0x3ffb6d1c
A14     : 0x00000000  A15     : 0x00000001  SAR     : 0x00000004  EXCCAUSE: 0x0000001c
EXCVADDR: 0x00000001  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffd

Backtrace:0x4000c24d:0x3ffd75e0 0x4018ae59:0x3ffd7600 0x4018c4f5:0x3ffd7910 0x4019bc85:0x3ffd7940 0x40096801:0x3ffd7970 0x401062e8:0x3ffd79c0 0x40105e20:0x3ffd79f0 0x40106068:0x3ffd7a40 0x400d94f2:0x3ffd7a70 0x400945b1:0x3ffd7a90
0x4018ae59: _vfprintf_r at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vfprintf.c:1521

0x4018c4f5: vprintf at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/newlib/newlib/libc/stdio/vprintf.c:34 (discriminator 
5)

0x4019bc85: esp_log_writev at C:/Espressif/frameworks/esp-idf-v4.3.4/components/log/log.c:189

0x40096801: esp_log_write at C:/Espressif/frameworks/esp-idf-v4.3.4/components/log/log.c:199

0x401062e8: MQTTAgentCommand_Publish at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent_command_functions.c:79 (discriminator 13)

0x40105e20: processCommand at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:584

0x40106068: MQTTAgent_CommandLoop at C:\mSpaceLock\mSpaceGate-fmw\build/../components/esp-aws-iot/libraries/coreMQTT-Agent/coreMQTT-Agent/source/core_mqtt_agent.c:1059

0x400d94f2: prvMQTTAgentTask at C:\mSpaceLock\mSpaceGate-fmw\build/../main/networking/mqtt/core_mqtt_agent_manager.c:519

0x400945b1: vPortTaskWrapper at C:/Espressif/frameworks/esp-idf-v4.3.4/components/freertos/port/xtensa/port.c:168

ELF file SHA256: ece89f91691638ee

Rebooting...

I was asked to set CONFIG_ESP_SYSTEM_PANIC_GDBSTUB option to get more logs about reasons behind it.

I finally was able to make my JTAG debugging to gbd work. (sorry this is my first time when I use it and I have no clue which information is useful and how to take more useful one, any advice is more than welcome)

ESP-IDF Monitor:

I (8529184) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (8529184) coreMQTT: State record updated. New state=MQTTPublishDone.
I (8529184) sub_pub_unsub: Publish 2591 succeeded for task "SubPub0".
I (8529184) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (8529204) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (8529204) sub_pub_unsub: Task "SubPub0" received: SubPub0
I (8529214) sub_pub_unsub: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 2
I (8531444) sub_pub_unsub: Unsubscribe 2592 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (8531444) sub_pub_unsub: Task "SubPub0" completed a loop. Delaying before next loop.
I (8536454) sub_pub_unsub: Task "SubPub0" sending subscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 259
I (8536894) sub_pub_unsub: Subscribe 2593 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (8536894) sub_pub_unsub: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub
I (8536904) sub_pub_unsub: Task "SubPub0" waiting for publish 2594 to complete.
I (8537894) coreMQTT: Publishing message to /filter/SubPub0.

I (8539134) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (8539134) coreMQTT: State record updated. New state=MQTTPublishDone.
I (8539134) sub_pub_unsub: Publish 2594 succeeded for task "SubPub0".
I (8539144) coreMQTT: De-serialized incoming PUBLISH packet: DeserializerResult=MQTTSuccess.
I (8539154) coreMQTT: State record updated. New state=MQTTPubAckSend.
I (8539154) sub_pub_unsub: Task "SubPub0" received: SubPub0
I (8539164) sub_pub_unsub: Task "SubPub0" sending unsubscribe request to coreMQTT-Agent for topic filter: /filter/SubPub0 with id 2
I (8541374) sub_pub_unsub: Unsubscribe 2595 for topic filter /filter/SubPub0 succeeded for task "SubPub0".
I (8541374) sub_pub_unsub: Task "SubPub0" completed a loop. Delaying before next loop.
I (8544384) ota_over_mqtt: OTA agent suspend by timer.
W (8545384) AWS_OTA: OTA Timer handle NULL for Timerid=1, can't stop.
I (8545384) AWS_OTA: OTA Agent is suspended.
acket: DeserializerResult=MQTTSuccess.
I (8548844) coreMQTT: State record updated. New state=MQTTPubAckSend.
W (8548844) sub_pub_unsub: Error or timed out waiting for ack for publish message 2597. Re-attempting publish.
I (8548854) coreMQTT: Ack packet deserialized with result: MQTTSuccess.
I (8548864) sub_pub_unsub: Task "SubPub0" sending publish request to coreMQTT-Agent with message "SubPub0" on topic "/filter/SubPub0" with ID 2597. with ID 2597.
I (8548864) coreMQTT: State record updated. New state=MQTTPublishDone.
I (8548884) sub_pub_unsub: Task "SubPub0" waiting for publish 2597 to complete.
I (8548894) sub_pub_unsub: Publish 2597 succeeded for task "SubPub0".
Guru Meditation Error: Core  1 panic'ed (LoadProhibited). Exception was unhandled.

Setting breakpoint at 0x4000c250 and returning...

The debug console output:

GNU gdb (crosstool-NG esp-2021r2-patch3) 9.2.90.20200913-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-host_w64-mingw32 --target=xtensa-esp32-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Warning: Debuggee TargetArchitecture not detected, assuming x86_64.
=cmd-param-changed,param="pagination",value="off"
keep_alive() was not invoked in the 1000 ms timelimit. GDB alive packet not sent! (1115 ms). Workaround: increase "set remotetimeout" in GDB
[esp32.cpu0] Target halted, PC=0x40197352, debug_reason=00000000
Set GDB target to 'esp32.cpu0'
[esp32.cpu1] Target halted, PC=0x4000C250, debug_reason=00000001
[New Thread 1073467380]
[New Thread 1073547452]
[New Thread 1073573652]
[New Thread 1073557628]
[New Thread 1073596624]
[New Thread 1073446436]
[New Thread 1073569576]
[New Thread 1073549852]
[New Thread 1073587112]
[New Thread 1073444376]
[New Thread 1073446016]
[New Thread 1073561688]
[New Thread 1073581788]

Thread 1 "coreMQTT-Agent" received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 1073578364]
0x4000c250 in ?? ()
Execute debugger commands using "-exec <command>", for example "-exec info registers" will list registers in use (when GDB is the debugger)

The debug console is only part of what you would like to see as there is much more information in the VSCode debugger...

The call stack is looking like this: image

I'm not sure how to provide all useful information about variables and registers from the debugger (I should probably be able to list them out to the gdb console if I know the right commands 😉 For the time being I paste a few screenshots which in my gut filling might be useful to understand the root cause.

image

Am I right that issue might be caused by pTopicName where it is blank (and should be /filter/SubPub0) but topicNameLength is 40006 ? So in next step in log.c format contain some garbage value.

image

pPublishinfo --> pTopicName is coming from pArgs where in all there is 0x3ffd5d34 core_mqtt_agent_commad_functions.c line:72 pPublishInfo = ( const MQTTPublishInfo_t * ) ( pPublishArg );

image

core_mqtt_agent.c line:570 pCommandArgs = pCommand->pArgs;

I've lost track in this point as I couldn't figure out how the pPayload and pTopicName should be correctly passed into processCommand or MQTTAgentCommand_Publish functions

BTW After more deep dive into this problem I feel that I should move this topic in to coreMQTT-Agent repo, isn't it?

Originally posted by @NightSkySK in https://github.com/FreeRTOS/iot-reference-esp32c3/issues/63#issuecomment-1899963315

Skptak commented 8 months ago

Hey @NightSkySK, thanks for raising this issue Just wanted to echo the message I left in https://github.com/FreeRTOS/iot-reference-esp32c3/issues/45 that the team is looking into this and other MQTT based issues that are being raised in the iot-reference-esp32c3 repo

NightSkySK commented 8 months ago

As we followed that issue at https://github.com/FreeRTOS/iot-reference-esp32c3/issues/63 the root cause was found in the sub_pub_unsub_demo task. Error was casued by handling PUBACK. I suppose we can close it here as it's not related with coreMQTT-Agent library