espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.71k stars 7.3k forks source link

MQTT seems to be blocking API (IDFGH-7543) #9106

Closed diplfranzhoepfinger closed 2 years ago

diplfranzhoepfinger commented 2 years ago

Environment

Problem Description

It seems that the MQTT API is blocking.

Expected Behavior

Actual Behavior

we have duration_min/max 1000/1000 microseconds before MQTT run

we have duration_min/max 34/364679 microseconds while connecting, this might be worse if using ESP-Modem or slower Internet connections. This Means that the Task-Scheduler is totally blocked at least once for about 400ms. in Best conditions.

we have duration_min/max 649/1351 microseconds after the run of the sample, which is acceptable.

Steps to reproduce

run the Project

Code to reproduce this issue

https://github.com/DiplomInformatikerFranzHoepfinger/mqtt_and_watchdog

Debug Logs

Executing action: monitor
Running idf_monitor in directory /home/franz/git/franz/mqtt_and_watchdog/mqtt_ssl
Executing "/home/franz/.espressif/python_env/idf4.4_py3.8_env/bin/python /home/franz/esp-idf-v4.4/tools/idf_monitor.py -p /dev/ttyUSB0 -b 115200 --toolchain-prefix xtensa-esp32-elf- --target esp32 --revision 3 --print_filter  /home/franz/git/franz/mqtt_and_watchdog/mqtt_ssl/build/mqtt_ssl.elf -m '/home/franz/.espressif/python_env/idf4.4_py3.8_env/bin/python' '/home/franz/esp-idf-v4.4/tools/idf.py'"...
--- idf_monitor on /dev/ttyUSB0 115200 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ets Jul 29 2019 12:21:46

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:6932
load:0x40078000,len:15516
load:0x40080400,len:4284
0x40080400: _init at ??:?

entry 0x40080680
I (27) boot: ESP-IDF v4.4-dirty 2nd stage bootloader
I (27) boot: compile time 17:31:46
I (27) boot: chip revision: 3
I (30) qio_mode: Enabling default flash chip QIO
I (36) boot.esp32: SPI Speed      : 80MHz
I (40) boot.esp32: SPI Mode       : QIO
I (45) boot.esp32: SPI Flash Size : 8MB
I (50) boot: Enabling RNG early entropy source...
I (55) boot: Partition Table:
I (58) boot: ## Label            Usage          Type ST Offset   Length
I (66) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (73) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (81) boot:  2 factory          factory app      00 00 00010000 00177000
I (88) boot: End of partition table
I (92) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=1b2a4h (111268) map
I (132) esp_image: segment 1: paddr=0002b2cc vaddr=3ffb0000 size=02f2ch ( 12076) load
I (135) esp_image: segment 2: paddr=0002e200 vaddr=40080000 size=01e18h (  7704) load
I (140) esp_image: segment 3: paddr=00030020 vaddr=400d0020 size=8b864h (571492) map
I (303) esp_image: segment 4: paddr=000bb88c vaddr=40081e18 size=118e4h ( 71908) load
I (327) esp_image: segment 5: paddr=000cd178 vaddr=50000000 size=00010h (    16) load
I (336) boot: Loaded app from partition at offset 0x10000
I (337) boot: Disabling RNG early entropy source...
I (349) cpu_start: Pro cpu up.
I (349) cpu_start: Single core mode
I (357) cpu_start: Pro cpu start user code
I (357) cpu_start: cpu freq: 240000000
I (357) cpu_start: Application information:
I (362) cpu_start: Project name:     mqtt_ssl
I (367) cpu_start: App version:      v0.1-19-g06a2a91-dirty
I (373) cpu_start: Compile time:     Jun  5 2022 17:43:18
I (379) cpu_start: ELF file SHA256:  52871ca846916717...
I (385) cpu_start: ESP-IDF:          v4.4-dirty
I (391) heap_init: Initializing. RAM available for dynamic allocation:
I (398) heap_init: At 3FF80000 len 00002000 (8 KiB): RTCRAM
I (404) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (410) heap_init: At 3FFB6C08 len 000293F8 (164 KiB): DRAM
I (416) heap_init: At 3FFE0440 len 0001FBC0 (126 KiB): D/IRAM
I (423) heap_init: At 40078000 len 00008000 (32 KiB): IRAM
I (429) heap_init: At 400936FC len 0000C904 (50 KiB): IRAM
I (436) spi_flash: detected chip: generic
I (440) spi_flash: flash io: qio
I (444) cpu_start: Starting scheduler on PRO CPU.
I (449) example: Example configured to blink GPIO LED!
I (454) gpio: GPIO[5]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (5464) example: duration_min/max 957/1000 microseconds
I (10464) example: duration_min/max 1000/1000 microseconds
I (15464) example: duration_min/max 1000/1000 microseconds
I (20464) example: duration_min/max 1000/1000 microseconds
I (20464) MQTTS_EXAMPLE: [APP] Startup..
I (20464) MQTTS_EXAMPLE: [APP] Free memory: 292072 bytes
I (20469) MQTTS_EXAMPLE: [APP] IDF version: v4.4-dirty
I (20489) wifi:wifi driver task: 3ffbcf68, prio:23, stack:6656, core=0
I (20489) system_api: Base MAC address is not set
I (20489) system_api: read default base MAC address from EFUSE
I (20500) wifi:wifi firmware version: 7679c42
I (20500) wifi:wifi certification version: v7.0
I (20504) wifi:config NVS flash: enabled
I (20508) wifi:config nano formating: disabled
I (20512) wifi:Init data frame dynamic rx buffer num: 32
I (20517) wifi:Init management frame dynamic rx buffer num: 32
I (20523) wifi:Init management short buffer num: 32
I (20527) wifi:Init dynamic tx buffer num: 32
I (20531) wifi:Init static rx buffer size: 1600
I (20536) wifi:Init static rx buffer num: 10
I (20539) wifi:Init dynamic rx buffer num: 32
I (20544) wifi_init: rx ba win: 6
I (20548) wifi_init: tcpip mbox: 32
I (20552) wifi_init: udp mbox: 6
I (20556) wifi_init: tcp mbox: 6
I (20560) wifi_init: tcp tx win: 5744
I (20564) wifi_init: tcp rx win: 5744
I (20568) wifi_init: tcp mss: 1440
I (20572) wifi_init: WiFi IRAM OP enabled
I (20577) wifi_init: WiFi RX IRAM OP enabled
I (20582) example_connect: Connecting to devolo-guest-644...
I (20589) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07
I (20671) wifi:mode : sta (58:bf:25:e0:71:30)
I (20672) wifi:enable tsf
I (20673) example_connect: Waiting for IP(s)
I (22717) wifi:new:<11,0>, old:<1,0>, ap:<255,255>, sta:<11,0>, prof:1
I (23080) wifi:state: init -> auth (b0)
I (23093) wifi:state: auth -> assoc (0)
I (23101) wifi:state: assoc -> run (10)
I (23124) wifi:connected with devolo-guest-644, aid = 2, channel 11, BW20, bssid = be:be:f4:a6:be:f4
I (23124) wifi:security: WPA2-PSK, phy: bgn, rssi: -43
I (23127) wifi:pm start, type: 1

W (23135) wifi:<ba-add>idx:0 (ifx:0, be:be:f4:a6:be:f4), tid:6, ssn:2, winSize:64
I (23168) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (23987) esp_netif_handlers: example_connect: sta ip: 192.168.178.98, mask: 255.255.255.0, gw: 192.168.178.1
I (23988) example_connect: Got IPv4 event: Interface "example_connect: sta" address: 192.168.178.98
I (24487) example_connect: Got IPv6 event: Interface "example_connect: sta" address: fe80:0000:0000:0000:5abf:25ff:fee0:7130, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (24492) example_connect: Connected to example_connect: sta
I (24498) example_connect: - IPv4 address: 192.168.178.98
I (24504) example_connect: - IPv6 address: fe80:0000:0000:0000:5abf:25ff:fee0:7130, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (24515) MQTTS_EXAMPLE: [APP] Free memory: 249020 bytes
I (24522) MQTTS_EXAMPLE: Other event id:7
W (24535) wifi:<ba-add>idx:1 (ifx:0, be:be:f4:a6:be:f4), tid:0, ssn:0, winSize:64
I (25732) MQTTS_EXAMPLE: MQTT_EVENT_CONNECTED
I (25733) MQTTS_EXAMPLE: sent subscribe successful, msg_id=37197
I (25735) MQTTS_EXAMPLE: sent subscribe successful, msg_id=48076
I (25740) MQTTS_EXAMPLE: sent unsubscribe successful, msg_id=5989
I (25936) MQTTS_EXAMPLE: MQTT_EVENT_SUBSCRIBED, msg_id=37197
I (25938) MQTTS_EXAMPLE: sent publish successful, msg_id=0
I (25990) example: duration_min/max 34/364679 microseconds
I (26142) MQTTS_EXAMPLE: MQTT_EVENT_SUBSCRIBED, msg_id=48076
I (26143) MQTTS_EXAMPLE: sent publish successful, msg_id=0
I (26346) MQTTS_EXAMPLE: MQTT_EVENT_UNSUBSCRIBED, msg_id=5989
I (26490) MQTTS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=/topic/qos0
DATA=data
I (26755) MQTTS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=/topic/qos0
DATA=data
I (30990) example: duration_min/max 31/1970 microseconds
I (35990) example: duration_min/max 576/1424 microseconds
I (40990) example: duration_min/max 623/1377 microseconds
I (45990) example: duration_min/max 567/1433 microseconds
I (50990) example: duration_min/max 726/1274 microseconds
I (55990) example: duration_min/max 649/1351 microseconds

Other items if possible

diplfranzhoepfinger commented 2 years ago

### Actual Behavior

enabling Dual-Core helps hugely, 

but the issue that MQTT Lib is blocking is not solved this way. 

### Steps to reproduce

run the Project

### Code to reproduce this issue

https://github.com/DiplomInformatikerFranzHoepfinger/mqtt_and_watchdog/tree/feature/DualCore

## Debug Logs

```
Executing action: monitor
Running idf_monitor in directory /home/franz/git/franz/mqtt_and_watchdog/mqtt_ssl
Executing "/home/franz/.espressif/python_env/idf4.4_py3.8_env/bin/python /home/franz/esp-idf-v4.4/tools/idf_monitor.py -p /dev/ttyUSB0 -b 115200 --toolchain-prefix xtensa-esp32-elf- --target esp32 --revision 3 --print_filter  /home/franz/git/franz/mqtt_and_watchdog/mqtt_ssl/build/mqtt_ssl.elf -m '/home/franz/.espressif/python_env/idf4.4_py3.8_env/bin/python' '/home/franz/esp-idf-v4.4/tools/idf.py'"...
--- idf_monitor on /dev/ttyUSB0 115200 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ets Jul 29 2019 12:21:46

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:1
load:0x3fff0030,len:6936
load:0x40078000,len:15512
load:0x40080400,len:4492
0x40080400: _init at ??:?

entry 0x400806b4
I (27) boot: ESP-IDF v4.4-dirty 2nd stage bootloader
I (27) boot: compile time 18:15:17
I (27) boot: chip revision: 3
I (30) qio_mode: Enabling default flash chip QIO
I (36) boot.esp32: SPI Speed      : 80MHz
I (40) boot.esp32: SPI Mode       : QIO
I (45) boot.esp32: SPI Flash Size : 8MB
I (50) boot: Enabling RNG early entropy source...
I (55) boot: Partition Table:
I (58) boot: ## Label            Usage          Type ST Offset   Length
I (66) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (73) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (81) boot:  2 factory          factory app      00 00 00010000 00177000
I (88) boot: End of partition table
I (92) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=1b848h (112712) map
I (132) esp_image: segment 1: paddr=0002b870 vaddr=3ffb0000 size=03758h ( 14168) load
I (137) esp_image: segment 2: paddr=0002efd0 vaddr=40080000 size=01048h (  4168) load
I (139) esp_image: segment 3: paddr=00030020 vaddr=400d0020 size=8bf64h (573284) map
I (304) esp_image: segment 4: paddr=000bbf8c vaddr=40081048 size=13024h ( 77860) load
I (330) esp_image: segment 5: paddr=000cefb8 vaddr=50000000 size=00010h (    16) load
I (340) boot: Loaded app from partition at offset 0x10000
I (340) boot: Disabling RNG early entropy source...
I (352) cpu_start: Pro cpu up.
I (352) cpu_start: Starting app cpu, entry point is 0x40081134
0x40081134: call_start_cpu1 at /home/franz/esp-idf-v4.4/components/esp_system/port/cpu_start.c:156

I (0) cpu_start: App cpu up.
I (366) cpu_start: Pro cpu start user code
I (366) cpu_start: cpu freq: 240000000
I (366) cpu_start: Application information:
I (371) cpu_start: Project name:     mqtt_ssl
I (376) cpu_start: App version:      v0.2-1-g84042cd
I (382) cpu_start: Compile time:     Jun  5 2022 18:15:28
I (388) cpu_start: ELF file SHA256:  11b54b8a7b517646...
I (394) cpu_start: ESP-IDF:          v4.4-dirty
I (399) heap_init: Initializing. RAM available for dynamic allocation:
I (406) heap_init: At 3FFAE6E0 len 00001920 (6 KiB): DRAM
I (412) heap_init: At 3FFB7518 len 00028AE8 (162 KiB): DRAM
I (418) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (425) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (431) heap_init: At 4009406C len 0000BF94 (47 KiB): IRAM
I (438) spi_flash: detected chip: generic
I (442) spi_flash: flash io: qio
I (447) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (5) example: Example configured to blink GPIO LED!
I (10) gpio: GPIO[5]| InputEn: 0| OutputEn: 0| OpenDrain: 0| Pullup: 1| Pulldown: 0| Intr:0 
I (10489) example: duration_min/max 42/1000 microseconds
I (15489) example: duration_min/max 1000/1000 microseconds
I (20489) example: duration_min/max 1000/1000 microseconds
I (25475) MQTTS_EXAMPLE: [APP] Startup..
I (25475) MQTTS_EXAMPLE: [APP] Free memory: 274276 bytes
I (25475) MQTTS_EXAMPLE: [APP] IDF version: v4.4-dirty
I (25489) example: duration_min/max 985/1016 microseconds
I (25546) wifi:wifi driver task: 3ffc0aa0, prio:23, stack:6656, core=0
I (25546) system_api: Base MAC address is not set
I (25546) system_api: read default base MAC address from EFUSE
I (25563) wifi:wifi firmware version: 7679c42
I (25563) wifi:wifi certification version: v7.0
I (25563) wifi:config NVS flash: enabled
I (25565) wifi:config nano formating: disabled
I (25569) wifi:Init data frame dynamic rx buffer num: 32
I (25574) wifi:Init management frame dynamic rx buffer num: 32
I (25579) wifi:Init management short buffer num: 32
I (25584) wifi:Init dynamic tx buffer num: 32
I (25588) wifi:Init static rx buffer size: 1600
I (25592) wifi:Init static rx buffer num: 10
I (25596) wifi:Init dynamic rx buffer num: 32
I (25601) wifi_init: rx ba win: 6
I (25604) wifi_init: tcpip mbox: 32
I (25609) wifi_init: udp mbox: 6
I (25613) wifi_init: tcp mbox: 6
I (25616) wifi_init: tcp tx win: 5744
I (25621) wifi_init: tcp rx win: 5744
I (25625) wifi_init: tcp mss: 1440
I (25629) wifi_init: WiFi IRAM OP enabled
I (25634) wifi_init: WiFi RX IRAM OP enabled
I (25639) example_connect: Connecting to devolo-guest-644...
I (25646) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07
I (25732) wifi:mode : sta (58:bf:25:e0:71:30)
I (25732) wifi:enable tsf
I (25733) example_connect: Waiting for IP(s)
I (27778) wifi:new:\<11,0>, old:\<1,0>, ap:\<255,255>, sta:\<11,0>, prof:1
I (28223) wifi:state: init -> auth (b0)
I (28234) wifi:state: auth -> assoc (0)
I (28242) wifi:state: assoc -> run (10)
I (28265) wifi:connected with devolo-guest-644, aid = 2, channel 11, BW20, bssid = be:be:f4:a6:be:f4
I (28265) wifi:security: WPA2-PSK, phy: bgn, rssi: -45
I (28268) wifi:pm start, type: 1

W (28278) wifi:\idx:0 (ifx:0, be:be:f4:a6:be:f4), tid:6, ssn:2, winSize:64
I (28325) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (29044) esp_netif_handlers: example_connect: sta ip: 192.168.178.98, mask: 255.255.255.0, gw: 192.168.178.1
I (29045) example_connect: Got IPv4 event: Interface "example_connect: sta" address: 192.168.178.98
I (29545) example_connect: Got IPv6 event: Interface "example_connect: sta" address: fe80:0000:0000:0000:5abf:25ff:fee0:7130, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (29549) example_connect: Connected to example_connect: sta
I (29555) example_connect: - IPv4 address: 192.168.178.98
I (29561) example_connect: - IPv6 address: fe80:0000:0000:0000:5abf:25ff:fee0:7130, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (29572) MQTTS_EXAMPLE: [APP] Free memory: 231212 bytes
I (29579) MQTTS_EXAMPLE: Other event id:7
W (29598) wifi:\idx:1 (ifx:0, be:be:f4:a6:be:f4), tid:0, ssn:0, winSize:64
I (30490) example: duration_min/max 625/2065 microseconds
I (31196) MQTTS_EXAMPLE: MQTT_EVENT_CONNECTED
I (31198) MQTTS_EXAMPLE: sent subscribe successful, msg_id=40414
I (31199) MQTTS_EXAMPLE: sent subscribe successful, msg_id=31913
I (31204) MQTTS_EXAMPLE: sent unsubscribe successful, msg_id=14353
I (31400) MQTTS_EXAMPLE: MQTT_EVENT_SUBSCRIBED, msg_id=40414
I (31402) MQTTS_EXAMPLE: sent publish successful, msg_id=0
I (31547) MQTTS_EXAMPLE: MQTT_EVENT_SUBSCRIBED, msg_id=31913
I (31548) MQTTS_EXAMPLE: sent publish successful, msg_id=0
I (31709) MQTTS_EXAMPLE: MQTT_EVENT_UNSUBSCRIBED, msg_id=14353
I (31912) MQTTS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=/topic/qos0
DATA=data
I (32322) MQTTS_EXAMPLE: MQTT_EVENT_DATA
TOPIC=/topic/qos0
DATA=data
I (35490) example: duration_min/max 623/1376 microseconds
I (40490) example: duration_min/max 815/1184 microseconds
I (45490) example: duration_min/max 804/1195 microseconds
I (50490) example: duration_min/max 797/1202 microseconds

```

## Other items if possible

- [X ] sdkconfig file see github linked above.

david-cermak commented 2 years ago

@diplfranzhoepfinger Yes most of the mqtt API is blocking, but for publishing you can use esp_mqtt_client_enqueue()

https://github.com/espressif/esp-mqtt/blob/684843a3096fea77346fd7f6994dcf984d65308b/include/mqtt_client.h#L354-L373

Which would just enqueue the message to be sent later in the mqtt thread context.

About connection: It's also happening in the mqtt task, so shouldn't block user threads. The default timeout is 10s, but it's configurable:

https://github.com/espressif/esp-mqtt/blob/master/include/mqtt_client.h#L221

diplfranzhoepfinger commented 2 years ago

Hello Franz,

I agree with the comments below and I can add that we are already using esp_mqtt_client_enqueue().

The main issue is that if we are running the mqtt task on the same core where the watchdog is activated we are reset.

For me the biggest points are:

We need for the 2 use cases above certain state machines that will work in background without overloading a certain task.

In my opinion the change is not really complicated but of course some rework is necessary.

Shall we organize a meeting regarding those topics?

Best Regards,

Vasil Nikolov

CEO and CO-Founder

Email: @.***

Phone: +359 888 200510

Banat 1

1407 Sofia

Bulgaria

www.rilabs.io

Am Di., 7. Juni 2022 um 19:44 Uhr schrieb david-cermak < @.***>:

@diplfranzhoepfinger https://github.com/diplfranzhoepfinger Yes most of the mqtt API is blocking, but for publishing you can use esp_mqtt_client_enqueue()

https://github.com/espressif/esp-mqtt/blob/684843a3096fea77346fd7f6994dcf984d65308b/include/mqtt_client.h#L354-L373

Which would just enqueue the message to be sent later in the mqtt thread context.

About connection: It's also happening in the mqtt task, so shouldn't block user threads. The default timeout is 10s, but it's configurable:

https://github.com/espressif/esp-mqtt/blob/master/include/mqtt_client.h#L221

— Reply to this email directly, view it on GitHub https://github.com/espressif/esp-idf/issues/9106#issuecomment-1148982035, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJAHMXKZE3ZCESNS56HQK5LVN6C77ANCNFSM5X5HNRNA . You are receiving this because you were mentioned.Message ID: @.***>

-- Diplom-Informatiker Franz Höpfinger e.K. Reibersdorf 16a 84419 Schwindegg

franz-ms-muc commented 2 years ago

Hello,

any news about this ? Vasil from www.rilabs.io need non-Blocking API.

david-cermak commented 2 years ago

Hi all,

If I understand correctly, there are two problems from the threading point of view: 1) mqtt task taking too much computing resources, that you're getting hit by the task WDT 2) other APIs, e.g. publish where you're not only waiting for the actual data to be sent out to the network, but also wait for the confirmation.

While, I cannot help much with the 1), as the TLS handshake is happening in the context of mqtt task (Here I'd suggest just adjusting the task watchdog timeout based on your needs, usually key/cert size), the second one could be easily handled by our mqtt cxx wrapper, that helps creating your own actions after an asynchronous event happened, for example:

https://github.com/espressif/esp-idf/blob/c2ccc383dae2a47c2c2dc8c7ad78175a3fd11361/examples/cxx/experimental/esp_mqtt_cxx/ssl/main/mqtt_ssl_example.cpp#L26-L44