Open hollingerc opened 1 year ago
Its probably bug in your lvgl code. From my most recent experience, and im not proud of it, it may be memory corruption cause by passing local variable from one function, which is later used by pointer. In my case i missed that bug in someone else code, but it was lame from my side.
Please check your code.
Thanks for the response.
Please elaborate - How was that variable passed? From what function? How was the variable used by the pointer?
I've been through my LVGL code and can't see a problem with it (not that there isn't a problem, I just can't see one if there is one). This is how the LVGL task is registered to the RTOS by xTaskCreatePinnedToCore():
static void lvgl_task(void *pvParam) { (void) pvParam;
while(true) { if (pdTRUE == xSemaphoreTake(gui_semaphore, portMAX_DELAY)) { lv_task_handler(); xSemaphoreGive(gui_semaphore); } vTaskDelay(pdMS_TO_TICKS(10)); }
vTaskDelete(NULL); }
gui_semaphore is declared as a static file scope variable and initialized before the task is created.
If I comment out the line with lv_task_handler(), re-compile, flash and run, the WiFi authenticates with no problems, no crashing. If I comment out the code that creates the screens with buttons, text, etc., (so LVGL basically has nothing to do) and leave lv_task_handler() intact, the program crashes as before. There appears to be a conflict between the LVGL task and the WiFi task.
Still hard to say. What is this task stack? Maybe try to increase it.
I did try that. Currently the stack is at 6 1024. When I raised it to 8 1024, the program still crashed. I could try it with an even bigger stack.
Another thing I tried is I setup and ran a heap trace dump. I had to be careful where I put the final statements so that they were printed to the monitor before the program crashed. The WiFi example calls esp_wifi_connect() in an event handler, I put the dump right after the call. This is the result:
172689 bytes 'leaked' in trace (171 allocations) total allocations 185 total frees 14
I (3140) btd_wifi: total_free_bytes (internal): 92091 I (3140) btd_wifi: total_free_bytes (SPIRAM): 7956076 I (3150) btd_wifi: total_free_bytes (DMA): 84295
Of course there are a lot of leaked bytes since the program halted because of the exception before any more bytes could be freed.
Seems to me there is plenty of RAM available for the program to run. I don't think the exception is caused by running out of RAM. That being said, it's not clear what happened between sending the log to monitor and the exception. Could there have been more leakage that would cause the program to run out of memory?
These memory-related problems are difficult to trouble shoot.
Next step i would do is to heap trace
only with wifi and only with lvgl.
Further investigation hasn't convinced me that there is a memory leak, although I could be wrong as I don't have time to pursue this issue more thoroughly. I needed to get my firmware working, and I finally succeeded. Here's what I did.
I simplified my testing firmware by creating a new project consisting only of these two IDF examples: /esp-idf/examples/peripherals/lcd/rgb_panel, and /esp-idf/examples/wifi/getting_started/station
The LCD example uses a chart example from LVGL.
I modified these slightly so that they would work as one project, took app_main() out of the WiFi code and added a call to start the WiFi process in ap_main() of the LCD code. I can provide the source files if needed.
I ran my tests again and got the same StoreProhibited exception. Further experimenting revealed that if I started the LVGL task after starting the WiFi process, the exception would not occur and the program ran correctly. This was not a solution for me though, as I need the user, through the GUI to be able to start the WiFi process at any time. Thus, LVGL needs to be running first.
I spent some time studying the IDF LCD documentation and saw that there were a couple of optional ways of using RAM. I tried implementing two frame buffers then tried implementing a bounce buffer as described in the documentation, but still the same exception occurred. I settled on the LCD using only one frame buffer in PSRAM (the buffer is way to big to put into internal RAM) and no bounce buffer.
LGVL needs one or two small (less than display sized) draw buffers. In the IDF LCD example, these were allocated in PSRAM using heap_caps_malloc(), so I tried allocating them in internal RAM (also using heap_caps_malloc). The same exception occurred.
I noticed in the LVGL documentation, the examples given simply declared these buffers as arrays of type lv_color_t. I tried this and the StoreProhibited exception no longer occurred and the program ran with LVGL running before the WiFi process started. I ported this change over to my code and it ran with no issues and was stable. The user can start WiFi by pressing buttons on the LCD. My code is now working as I expect it to.
My firmware is working now and I do not have the time to pursue this issue any further. I would say there is a bug in LVGL, and/or the LCD code, and/or the WiFi code. There seems to be an interaction between all three through the heap memory.
I'd like to keep the issue open to see if anyone can look into the problem and perhaps find a solution.
Hi @hollingerc, are you still facing this issue?
I'm seeing an identical issue here too. LVGL and WiFi @hollingerc how did you alter the buffer allocations so WiFi worked?
Mine is essentially:
lv_disp_buf1 = (lv_color_t *)heap_caps_malloc(LVGL_LCD_BUF_SIZE * sizeof(lv_color_t), MALLOC_CAP_DMA | MALLOC_CAP_INTERNAL);
@lduncan @hollingerc do let us know if you still think its a WiFi code issue and you guys are able to reproduce it.
@kapilkedawat I trace my issue to user error 🤦♂️ I was using a ESP32-S3-WROOM-1U-N16R8 module which has an Octal PSRAM, which means pins IO35, IO36, and IO37 are not available for other uses.
I was attempting to use these pins for other purposes. Disabling the PSRAM completely solved the issue.
Answers checklist.
IDF version.
v5.1-dev-1908-g439a709c42
Operating System used.
Linux
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
ESP32-S3-WROOM-1 on an ESP-DevKit-C
Power Supply used.
USB
What is the expected behavior?
I would expect the ESP32 to successfully log into my WiFi AP without crashing. My code previously ran successfully on a WT32-SC01 board from Waveshare. This board uses an ESP32-WROVER-B, and has an SPI LCD with capacitive touch screen (I2C touch controller).
My code is running LVGL in a task, running a 24-bit RGB LCD with an 8-bit data bus and capacitive touch screen (I2C touch controller), and running the sample WiFi code from esp-idf/examples/wifi/getting_started/station.
What is the actual behavior?
Instead of logging into my WiFi AP, a StoreProhibited exception is caused when the ESP32-S3 is authenticating the WiFi credentials. As can be seen from the debug log, the WiFi process was started, but only got so far. This is repeatable, happens every time I run the code.
My code runs an example from esp-idf/examples/wifi/getting_started/station and worked without crashing on another board that used an ESP32-WROVER-B.
Steps to reproduce.
void app_main(void) { BaseType_t error = pdPASS; esp_err_t nvs_err = ESP_OK;
ESP_LOGI(TAG, "app_main()");
nvs_err = nvs_flash_init(); if(nvs_err == ESP_ERR_NVS_NO_FREE_PAGES || nvs_err == ESP_ERR_NVS_NEW_VERSION_FOUND) { ESP_ERROR_CHECK(nvs_flash_erase()); nvs_err = nvs_flash_init(); } ESP_ERROR_CHECK(nvs_err);
/ The LCD and touch screen are initialized here. LVGL is initialize and a task is created running on Core 1. / (void)btd_lvgl_port_init();
/ Turn on the LCD backlight. / (void)btd_lcd_backlight(BTD_LCD_BACKLIGHT_ON);
/ Create three LVGL screens with buttons and text on each. Load one of the screens. / (void)btd_gui_main_start();
/ This starts the WiFi process. The code here is just a copy of esp-idf/examples/wifi/getting_started/station. / btd_wifi_start(); } I can provide software listings for the functions called in app_main if required.
Debug Logs.
More Information.
With a JTAG debugger running in Eclipse, I can trace through the last three functions before the exception occurs and follow the arguments passed from one function to the next. It appears that in the function tlsf_malloc(), before or during the call to block_locate_free(), a struct pointed to by an argument gets trashed. The struct contains pointers that get changed to point to invalid memory. They start out pointing to valid memory. When one of these pointers is used in remove_free_block() to access memory, the exception is raised. If I disable the LVGL task, this problem doesn't occur, WiFi authentication completes successfully. It would seem there is an interaction between the LVGL and WiFi tasks.
I've done some searching on the internet and have found posts of others with the same exception in the same functions, although doing something other than WiFi authentication (and no mention of LVGL). These were older posts and referred to older versions of the IDF (4.3 if I remember correctly), but the conclusion I saw was that the problem was fixed in newer versions of the IDF. I'm using version 5.1.
It could be the problem lies with these functions. The comments in the tlsf.c file indicate that these functions are not thread safe and it is expected the developer has to provide the protection. The failure seems to be in the ESP32 WiFi code (or LGVL?) which I have not written.
I'm happy to provide more of my source code if requested.