HASwitchPlate / openHASP

HomeAutomation Switchplate based on lvgl for ESP32
https://www.openhasp.com
MIT License
703 stars 182 forks source link

LV_MEM_CUSTOM provokes network loss and hanging #810

Open hb020 opened 2 days ago

hb020 commented 2 days ago

Perform all steps below and tick them with [x]

Describe the bug

#define LV_MEM_CUSTOM 1 with large HTTP network requests makes the device hang for about 10 minutes.

To Reproduce

SC01 Plus on 0.7.0-rc14 (wt32-sc01-plus_ota_v0.7.0-rc14_80a9ddb.bin, wt32-sc01-plus_16MB) in user_config_override.h (amongst others):

#define LV_MEM_SIZE (64 * 1024U)
#define LV_MEM_CUSTOM 1
#define HASP_LOG_LEVEL LOG_LEVEL_VERBOSE

Large pages.jsonl, with many large images and many (large) fonts. So large, that without LV_MEM_CUSTOM, it would crash at every page change with an "out of memory" message.

With LV_MEM_CUSTOM it allows page changes and appears stable, EXCEPT:

when starting the http server and requesting edit of /pages.jsonl

See console log (real IP address and SSID are obfuscated, 0.0.0.0 was left as it):

Prompt > [09:18:48.132][ 8436/20572 58] HTTP: Sending 200 /edit.htm to client connected from: 192.168.X.X
Prompt > [09:18:48.344][11508/23984 52] HTTP: Sent /api/files/ page to 192.168.X.X
Prompt > [09:18:48.346][ 9460/21920 56] FILE: Listing directory: /

Prompt > [09:18:59.053][11508/23728 51] HTTP: Sent /pages.jsonl page to 192.168.X.X  <--- this triggers it
Prompt > [09:19:19.292][ 4596/11032 58] MQTT: Disconnected
Prompt > [09:19:34.294][ 4596/11032 58] MQTT: Transport error
Prompt > [09:19:34.296][ 4596/11032 58] MQTT: Disconnected
Prompt > [09:19:49.298][ 4596/11032 58] MQTT: Transport error
Prompt > [09:19:49.301][ 4596/11032 58] MQTT: Disconnected
Prompt > [09:20:04.302][ 4596/11032 58] MQTT: Transport error
.....
Prompt > [09:23:19.365][ 4596/11704 60] MQTT: Transport error
Prompt > [09:23:19.367][ 4596/11704 60] MQTT: Disconnected
Prompt > [09:23:29.682][ 7412/19720 62] HTTP: Sending 200 /pages.jsonl to client connected from: 0.0.0.0  <--- this unblocks screen and console
Prompt > [09:23:29.686][ 7412/19768 62] MQTT: Not connected ??? idle => long
Prompt > [09:23:29.698][ 7412/19768 62] MSGR: Loading L:/idle_long.cmd
Prompt > [09:23:29.713][ 4596/13656 66] MSGR: json=['page 12', {"page":9, "id":2, "src":"L:/camerablank.bin"}]
Prompt > [09:23:29.715][ 4596/13520 66] MSGR: page=12
Prompt > [09:23:29.717][ 4596/13544 66] HASP: Changing page to 12
Prompt > [09:23:29.718][ 4596/13316 65] MQTT: Not connected ??? page => 12
Prompt > [09:23:29.732][ 7412/18492 59] MSGR: Loaded L:/idle_long.cmd
Prompt > [09:23:29.736][ 7412/19532 62] MSGR: File not found: L:/mqtt_off.cmd
Prompt > [09:23:34.370][ 7412/20440 63] MQTT: Transport error
Prompt > [09:23:34.373][ 7412/20440 63] MQTT: Disconnected
Prompt > [09:23:49.375][ 7412/20440 63] MQTT: Transport error
Prompt > [09:23:49.378][ 7412/20440 63] MQTT: Disconnected
....
Prompt > [09:29:34.492][ 7412/20288 63] MQTT: Disconnected
Prompt > [09:29:49.494][ 7412/20288 63] MQTT: Transport error
Prompt > [09:29:49.497][ 7412/20288 63] MQTT: Disconnected
Prompt > [09:29:53.528][13044/26336 50] WIFI: key update timeout  <--- this unblocks network
Prompt > [09:29:54.499][13044/27244 52] MQTT: Transport error
Prompt > [09:29:54.501][13044/27244 52] MQTT: Disconnected
Prompt > [09:29:55.959][13044/26800 51] WIFI: Connected to MY_SSID, requesting IP...
Prompt > [09:29:56.969][13044/26116 50] WIFI: Received IP address 192.168.X.X
Prompt > [09:29:56.971][13044/26140 50] ----: Connected = online
Prompt > [09:29:59.524][13044/25972 49] MQTT: Started
Prompt > [09:29:59.527][13044/25972 49] MQTT: Connected
Prompt > [09:29:59.528][13044/25972 49] MQTT: Connected to broker 192.168.X.X as clientID plate02_7279e0
....

The only way to get to that pages.jsonl, is to reset the device and not touch it before requesting the page via browser.

Once the pages.jsonl is loaded, updating/saving it is no problem, no matter the screen interactions.

Note that the WIFI: key update timeout can take much longer than 10 minutes, I was just "lucky" in this published sample.

Expected behavior

no hanging

Screenshots or video

tell me what you want me to do to get more detailed logs

fvanroie commented 2 days ago

This is typical behavior for memory exhaustion. Your ESP32 running on fumes and its behavior becomes erratic or crashes...

Large pages.jsonl, with many large images and many (large) fonts.

large and many are rather vague... Can you quantify how many and how big?

So large, that without LV_MEM_CUSTOM, it would crash at every page change with an "out of memory" message.

It seems you've hit some limits in LVGL with the number of objects and data that fits in the reserved LVGL memory (LV_MEM_SIZE ) and decided to extend the LVGL memory by sharing it with OS memory. LV_MEM_CUSTOM pools both LVGL and OS memory together, this allows more flexibility but can/will increase memory fragmentation.

You'll need to keep a close eye on memory consumption and see where/why the memory is consumed. Instead of enabling LV_MEM_CUSTOM , it should be better to (slightly) increase LV_MEM_SIZE to where it fits your objects.

hb020 commented 2 days ago

Can you quantify how many and how big?

pages.jsonl: 448 lines, 52kB, fonts sizes used: 16, 24, 32, 40, 50, 66, 175 files: 235 files, 5.2MB, apart from some small cmd files and the pages.jsonl, all .bin files

I already tried making a bin file of the 175 size font, no change seen.

Instead of enabling LV_MEM_CUSTOM , it should be better to (slightly) increase LV_MEM_SIZE to where it fits your objects.

Can I go over 64k?

Situation now:

Device Memory
--
Free Heap | 20.47 KiB
Free Block | 8.73 KiB
Fragmentation | 57%
PSRam Free | 1.82 MiB
PSRam Size | 1.99 MiB

Module
--
Model | ESP32-S3 rev0
Frequency | 240MHz
Core Version | 4.4.6
Reset Reason | CPU0: POWERON_RESET / CPU1: POWERON_RESET
Flash Size | 16.00 MiB
Program Size Used | 1.60 MiB
Program Size Free | 1.93 MiB
Filesystem Size | 11.93 MiB
Filesystem Used | 5.06 MiB
Filesystem Free | 6.87 MiB
fvanroie commented 2 days ago

Yes, you can go over 64kB if you have a lot of pages/objects/parameters set. That will prevent LVGL out of memory errors. You'll take a fixed amount of free Heap and convert that into available LVGL memory. The amount depends... don't go too high if LVGL doesn't really need it. This will prevent LVGL from hogging system memory and reduce fragmentation.

Check the logs to see the available system memory: eg [13044/27244 52] means 13kB continuous free block of 27kB free Heap and 52% fragmentation. Disabling LV_MEM_CUSTOM will add a second memory meter for LVGL memory too.

Try adding 8kB LVGL memory, if it's too big reduce by 4kB, if too small add 4kB, etc... But try to keep 25~30kB of system memory free.

hb020 commented 1 day ago

Will try the memory settings. (72kB seems to work so far)

Still, if the serving of the 52kB pages.jsonl file poses problems, is it served in one block or is it streamed out? Because uploading the file to the device poses no problem, the downloading does.

fvanroie commented 1 day ago

Files are streamed directly from the filesystem.

hb020 commented 1 day ago

yes, it streams in blocks of 1360 bytes it seems.

Issue be closed, but I was initially under the impression that LV_MEM_CUSTOM would make use of full PSRam, which is not the case at all. This, plus the explanation of what the memory info in the console logs mean, might benefit from explicit documentation somewhere.

That documentation maybe exists somewhere, but I haven't found it. I am even willing to write it, provided you tell me where.

fvanroie commented 1 day ago

These are inner workings of LVGL and described in include\lv_conf_v7.h, not even in the LVGL docs... LV_MEM_CUSTOM uses these allocators:

#define LV_MEM_CUSTOM_ALLOC   malloc       /*Wrapper to malloc*/
#define LV_MEM_CUSTOM_FREE    free         /*Wrapper to free*/

If you want to use PSram instead, change the define to ps_malloc if the device actually has PSram. Or use hasp_malloc which will check if PSram is available first and use it if it does. I haven't used LVGL memory in PSram, so please test and report if it works fine.

The openHASP documentation repo is here: https://github.com/HASwitchPlate/openHASP-docs or click the Edit icon at the top of each page to edit it.