espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.69k stars 7.29k forks source link

Improving RGB LCD performance (IDFGH-13663) #14541

Open kdschlosser opened 1 month ago

kdschlosser commented 1 month ago

Is your feature request related to a problem?

I am not sure exactly how to go about implementing something like this and I am sure that someone would be able to provide some ideas and suggestions. The current design of the RGB LCD panel driver completely negates the purpose of using DMA memory and double buffering.

This code here is where the problem is.

https://github.com/espressif/esp-idf/blob/cc3203dc4f087ab41b434afff1ed7520c6d90993/examples/peripherals/lcd/rgb_panel/main/rgb_lcd_example_main.c#L97

The entire purpose of using DMA memory for the frame buffers is to not block the CPU when data is transferring. That code there does exactly that. I causes a stall in the program until the buffer finished transferring.

Describe the solution you'd like.

If there was some kind of a public indication that the frame buffer that was just written was the first time it has been sent since the buffers have swapped then we can use that marker to make the call to LVGL to let it know that the frame buffer has finished it's transfer and the buffers are able to be switched out. This would remove the need to stall the program to make sure the buffer has finished transmitting, This is something that could be passed as an argument to the callback for the vsync and it would be for the purposes of letting a GUI framework know that the buffers are able to be swapped out,

With LVGL if the call is made to let LVGL know that the buffer has sent is made in the vsync callback and this is done every single time the vsync callback is called it will cause the synchronization to be off. I am not able to access the index number of the active frame buffer due to the structure that holds that index being declared/defined in a C source file and not in a header file. From the user end of things it makes for really complicated code to be able to access the region of memory that is storing the current index number to be able to identify what buffer the vsync callback has been called for and I am not even 100% sure if that number would be a thing that is able to be used in that manner...

Describe alternatives you've considered.

No response

Additional context.

No response

suda-morris commented 1 month ago

Hi @kdschlosser Thanks for sharing your idea.

On esp32s3, the DMA is used to transfer the internal frame buffer to the LCD controller. There's no addition DMA to copy the LVGL draw buffer(s) into the internal frame buffer (BTW, even we have free GDMA channels, but it's not 2D aware, so still not usable).

On esp32p4, there's a 2DDMA peripheral, we can use it to copy the draw buffers into frame buffer, asynchronously, so that we can utilize the LVGL double buffer mechanism. i.e. return from the flush_cb function early and prepare the content for the second draw buffer while the first draw buffer is still copying the data.

kdschlosser commented 1 month ago

I am going to need some clarification on how the RGB is used. I have been working on an LVGL binding to MicroPython that creates an API that is easier to use and has more flexibility then the current binding offers. Everything is working except I am running into a synchronization issue with the RGB driver. This issue is stemming from my refusing to put in place any code that blocks until the data transmit has finished.

What I am doing is I am using the frame buffers that the RGB driver creates and those are getting passed to LVGL as the frame buffers. so no copying of data is needing to be done in this design. But I need a way to tell LVGL when a frame buffer has been transferred for the first time since being swapped. So with the vsync callback that callback gets made each and every time the frame buffer gets written and the buffer can be written more than one time. It's an endless loop until the buffers get swapped. I can only call the function to let LVGL know that the buffer data has been written only 1 time for each time the buffers are swapped. If it is called more than once there is an overlap and LVGL can end up writing data to the buffer that is transmitting. I have no way of being able to identify the buffer that has just finished transmitting from inside of the callback..

I have an idea in an attempt to kick up performance. instead of LVGL needing to handle copying buffer data which it is not going to do the best way when dealing with DMA memory and the ESP32 what I am wanting to do is have the 2 full frame buffers from the RGB driver and then a smaller partial buffer which is what LVGL writes data to. when the flush function gets called the data from the partial buffer will get rotated and copied to the fb that is not transmitting. If one full transmit of the currently transmitting buffer has occurred then the fb's will be swapped. If not the flush function exits and LVGL is allowed to fill the partial buffer again. When the fb's get swapped a task running on the second core will be woken up and it will copy the data form the transmitting buffer to the idle buffer. There will be a mutex or sephamore that locks the idle buffer from being written to in the flush function. This would be the only place where there would be a program stall that can occur. It should not be a long wait if the wait even occurs.

Both of those should work and they should work well if and only if I have a way of knowing what frame buffer has been transmitted from inside of the vsync callback. In order for me to get that information I would need to be able to access the structure that holds the index and also the frame buffers. Problem there is it is defined in a C source file and not in a header file so there is no access to it unless I duplicate the structure in my code and use __containerof to access the data using a copied structure. That is very hackish to do.

The problem with the current way it is being done is the program sits there and does nothing until the vsync callback gets called. That is dead time. The purpose to using DMA memory is to be able to free the CPU to go and do other work while the data is transmitting, That block completely negates the reason for using DMA. when writing data to say an 800 x 480 x 16 display the amount of time that is spent sitting there waiting is measured in 10's of milliseconds. In a perfect world a 16 lane connection running at 12mhz the stall would be 32 milliseconds. That is a long time to wait sitting there doing zero work.