espressif / esp-mqtt

ESP32 mqtt component
Apache License 2.0
603 stars 255 forks source link

MQTT_API_LOCK in publish causes core panic (LoadProhibited) (IDFGH-4724) #185

Closed tmdesigned closed 3 years ago

tmdesigned commented 3 years ago

In a simple application, the MQTT_API_LOCK(client); call in the esp_mqtt_client_publish and esp_mqtt_client_enqueue functions causes the following core panic for my ESP32 device:

I (2637) esp_netif_handlers: example_connect: sta ip: 192.168.0.188, mask: 255.255.255.0, gw: 192.168.0.1
I (2637) example_connect: Got IPv4 event: Interface "example_connect: sta" address: 192.168.0.188
I (2647) example_connect: Connected to example_connect: sta
I (2647) example_connect: - IPv4 address: 192.168.0.188
I (2657) ESP32_GETTING_STARTED: Other event id:7
W (2677) wifi:<ba-add>idx:0 (ifx:0, b0:95:75:46:97:44), tid:0, ssn:5, winSize:64
W (2737) wifi:<ba-add>idx:1 (ifx:0, b0:95:75:46:97:44), tid:3, ssn:1, winSize:64
I (2777) ESP32_GETTING_STARTED: sent subscribe successful, msg_id=48830
I (2827) ESP32_GETTING_STARTED: MQTT_EVENT_SUBSCRIBED
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x400da056  PS      : 0x00060030  A0      : 0x800d7fdb  A1      : 0x3ffca000  
0x400da056: esp_mqtt_client_enqueue at ~/esp/esp-idf/components/mqtt/esp-mqtt/mqtt_client.c:1773

A2      : 0x00000001  A3      : 0x3ffca060  A4      : 0x3f404278  A5      : 0x00000000  
A6      : 0x00000001  A7      : 0x00000001  A8      : 0x00000000  A9      : 0x3ffc9c80  
A10     : 0x00000025  A11     : 0xffffffff  A12     : 0x00000025  A13     : 0x3ffc9e90  
A14     : 0x00000025  A15     : 0x3ffc9e90  SAR     : 0x00000000  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x000000e5  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffff9  

Backtrace:0x400da053:0x3ffca000 0x400d7fd8:0x3ffca050 0x4008bcb1:0x3ffca100
0x400da053: esp_mqtt_client_enqueue at ~/esp/esp-idf/components/mqtt/esp-mqtt/mqtt_client.c:1772

0x400d7fd8: sendMessage at~/Documents/Projects/scratch/build/../main/app_main.c:58 (discriminator 1)

This is from a simple application, the relevant lines of which are:

   esp_mqtt_client_config_t mqtt_cfg = {
        .uri = MQTT_URI,
        .client_id = DEVICE_ID,
        .username =ACCESS_KEY,
        .password = ACCESS_SECRET};

    esp_mqtt_client_handle_t client = esp_mqtt_client_init(&mqtt_cfg);
    esp_mqtt_client_register_event(client, ESP_EVENT_ANY_ID, mqtt_event_handler, client);
    esp_mqtt_client_start(client);
    esp_mqtt_client_enqueue(client, topic, "{\"data\": {\"message\": \"hello from ESP32\"}}", 0, 1, 0, true);

This occurred in my testing of both the stable release of esp-idf, as well as the master branch. In some cases, as this causes a reboot of the ESP32, the cycle of boot -> connect -> send -> crash repeats indefinitely. If I wait to send a message until the MQTT_EVENT_SUBSCRIBED event has been received back, then it crashes ~3 times before establishing itself and not crashing again (the non-simplified code uses a task to re-send the test message every 5 seconds).

david-cermak commented 3 years ago

Hi @tmdesigned

Could you please make sure, you're not passing a NULL pointer to the API? (one oversight on the mqtt client side is that it doesn't check for nullptr for these public APIs -- will fix!). This is a very generic use-case and it seems unlikely that something got broker here.

From the above coredump

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
...
...
EXCVADDR: 0x000000e5 ...

The LoadProhibited error indicates a failure in reading from an invalid address, which in this case equals to offsetof(struct esp_mqtt_client, api_lock) == 0x0e5, so the client is trying to lock the API using xSemaphoreTakeRecursive() accessing 0x05, which means, very likely, the API was called with

esp_mqtt_client_enqueue(client = NULL, ...);
tmdesigned commented 3 years ago

Thanks @david-cermak. You are correct, the mistake is on my end. The client was indeed not being passed in correctly (not visible in my simplified example above) and was therefore not accessible.