Write bitmap regions of screen from buffer (LVGL)

garageeks commented 2 years ago

Dear Rudolph, I was able to make Arduino PlatformIO example to work on my test bed composed of ESP32 MCU and a Riverdi EVE3 7" display. I would like to use LVGL (either v7 or v8) with your library. I'm aware of the LVGL ESP32 project, however it is for ESP-IDF and, despite being a derivative of your work, has significant differences which I try to overcome but failed.

I understand that LVGL just need a callback function to draw regions of bitmap from a SRAM buffer

The LVGL ESP32 project achieves this through the functions FT81x_flush and TFT_WriteBitmap below (https://github.com/lvgl/lvgl_esp32_drivers/blob/master/lvgl_tft/FT81x.c). The latter uses a function, EVE_memWrite_buffer, for which the closest function in your library is EVE_memWrite_sram_buffer. However it misses the flag that sends DISP_SPI_SIGNAL_FLUSH to the SPI transaction. The low level function to write to SPI bus (disp_spi_transaction) is very different from spi_transfer, I guess due to being ESP-IDF based.

Could you please pointing me to some clues to get it working? I would be happy to test a prototype function. I guess I can get the screen initialization working.

Thank you, Nick

void TFT_WriteBitmap(uint8_t* Bitmap, uint16_t X, uint16_t Y, uint16_t Width, uint16_t Height)
{
    // calc base address
    uint32_t addr = SCREEN_BITMAP_ADDR + (Y * BYTES_PER_LINE) + (X * BYTES_PER_PIXEL);

    // can we do a fast full width block transfer?
    if(X == 0 && Width == EVE_HSIZE)
    {
        EVE_memWrite_buffer(addr, Bitmap, (Height * BYTES_PER_LINE), true);
    }
    else
    {
        // line by line mode
        uint32_t bpl = Width * BYTES_PER_PIXEL;
        for (uint16_t i = 0; i < Height; i++)
        {
            EVE_memWrite_buffer(addr, Bitmap + (i * bpl), bpl, (i == Height - 1));
            addr += BYTES_PER_LINE;
        }
    }
}

// LittlevGL flush callback
void FT81x_flush(lv_disp_drv_t * drv, const lv_area_t * area, lv_color_t * color_map)
{
    TFT_WriteBitmap((uint8_t*)color_map, area->x1, area->y1, lv_area_get_width(area), lv_area_get_height(area));
}

RudolphRiedel commented 2 years ago

I briefly went over it but I am not exactly sure what to make of this. The concept of mis-using EVE as a frame-buffer is all wrong to begin with and I highly doubt that this can give good results. The result is very likely low framerates and tearing due to no synchronisation and no way to double-buffer the frames. And that particular port of my library mostly broke it and is as a result rather slow, mostly since every single command ended up pushing out a few bytes with its own DMA request.

That out of the way, my EVE_memWrite_sram_buffer() is not using DMA but direct transfers since it is not meant to be used repeatedly but rather during init or to update assets. So what you actually need is more a function like EVE_start_dma_transfer() (ESP32, line 32) in EVE_cpp_target.cpp.

void EVE_start_dma_transfer(void) { spi_transaction_t EVE_spi_transaction = {0}; digitalWrite(EVE_CS, LOW); / make EVE listen / EVE_spi_transaction.tx_buffer = (uint8_t ) &EVE_dma_buffer[1]; EVE_spi_transaction.length = (EVE_dma_buffer_index-1) 4 * 8; EVE_spi_transaction.addr = 0x00b02578; // WRITE + REG_CMDB_WRITE; spi_device_queue_trans(EVE_spi_device, &EVE_spi_transaction, portMAX_DELAY); EVE_dma_busy = 42; }

And there is this to end the transfer:

static void eve_spi_post_transfer_callback(void) { digitalWrite(EVE_CS, HIGH); / tell EVE to stop listen / EVE_dma_busy = 0; }

This is the ESP-IDF code I implemented for ESP32-Arduino.

Something like this might work:

void EVE_push_dma_buffer(uint32_t ftAddress, const uint8_t EVE_dma_buffer, uint32_t length) { spi_transaction_t EVE_spi_transaction = {0}; digitalWrite(EVE_CS, LOW); / make EVE listen / EVE_spi_transaction.tx_buffer = EVE_dma_buffer; EVE_spi_transaction.length = length 8; / length is in byte but the transaction.length is in bits / EVE_spi_transaction.addr =ftAddress; spi_device_queue_trans(EVE_spi_device, &EVE_spi_transaction, portMAX_DELAY); EVE_dma_busy = 42; }

Of course this needs a basic display list first to actually display a memory region as image.

And I just noticed that with the lvgl port the CS pin is handled automatically with no callback. So the DMA transfers are just stuffed into the queue but this requires changes to EVE_init_spi() as well.

The main difference is that with display lists a buffer with a maximum size of 4k needs to get pushed out once every 17ms or more, I usually implement 20ms. And in frame-buffer mode preferably small chunks of memory for display regions that change get stuffed into the DMA buffer to be automatically processed, this could happen at any time and for multiple buffers.

Since I am mixing transfers modes, DMA for the buffer, non DMA for small reads to check touch for example, There needs to be a way to tell if a DMA transfer is ongoing, hence EVE_dma_busy which should be then set all the time to lockout everything else. So this free-running-stuff-the-dma-queue mode for a frame buffer would need an extra function to read the touch information with DMA and I presume there is annother lvgl callback for this.

garageeks commented 2 years ago

Hi, thank your your clear explanation. I agree with you that this approach seems to have poor performance and is using almost nothing of the capabilities of the FT8xx chipset. But for my application, which uses no touchscreen, should be enough. This should makes things a little bit easier. Also, in my case the display will be the only device of the SPI bus, so perhaps I can leave the CS pin high? I will try to put together a proof of concept with your feedback and let you know the results

Nick

RudolphRiedel commented 2 years ago

No, CS can not be left high, this is the reset for the state machine of any SPI device and in case of the EVE chips the first three bytes after the CS to low transition are always the address to write to. Since the only updates are thru pushing the bitmap or portions of it, you could configure automatic handling of the CS pin once the normal init is done and the display-list with the image to be displayed has been send.

garageeks commented 2 years ago

Hi Rudolph, I was able to cobble together everything and get it compiled, however there are a couple of issues: 1) When running the initialization sequence I got from the LVGL ESP32 repository:

      EVE_start_cmd_burst(); /* start writing to the cmd-fifo as one stream of bytes, only sending the address once */

        EVE_cmd_dl(CMD_DLSTART); /* start the display list */

        EVE_cmd_dl(DL_CLEAR_RGB | BLACK); /* set the default clear color to black */
        EVE_cmd_dl(DL_CLEAR | CLR_COL | CLR_STN | CLR_TAG); /* clear the screen - this and the previous prevent artifacts between lists, Attributes are the color, stencil and tag buffers */

        EVE_cmd_dl(TAG(0));

        // fullscreen bitmap for memory-mapped direct access
        EVE_cmd_dl(TAG(20));
        EVE_cmd_setbitmap(SCREEN_BITMAP_ADDR, EVE_RGB565, EVE_HSIZE, EVE_VSIZE);
        EVE_cmd_dl(DL_BEGIN | EVE_BITMAPS);
        EVE_cmd_dl(VERTEX2F(0, 0));
        EVE_cmd_dl(DL_END);

        EVE_cmd_dl(TAG(0));

        EVE_cmd_dl(DL_DISPLAY); /* instruct the graphics processor to show the list */

        EVE_cmd_dl(CMD_SWAP); /* make this list active */

        EVE_end_cmd_burst(); /* stop writing to the cmd-fifo */

I get this error: E (112) spi_master: check_trans_valid(1113): trans tx_buffer should be NULL and SPI_TRANS_USE_TXDATA should be cleared to skip MOSI phase. It looks like this error is shown when a SPI transfer has zero bytes length. However if I remove EVE_start_cmd_burst();and EVE_end_cmd_burst(); the error goes away. Then, another error appeared when calling EVE_push_dma_buffer(): spi_master: spi_device_queue_trans(620): txdata transfer > host maximum In EVE_init_spi function increased buscfg.max_transfer_sz= 4088; to 32768 and the error is gone.

However the display is turned on, displaying a white/gray background and that's it. I have enabled LVGL log and indeed I can see from the serial port it is drawing objects:

INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_core\lv_obj.c#461: lv_obj_create: Object create ready
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_core\lv_obj.c#461: lv_obj_create: Object create ready
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_core\lv_obj.c#461: lv_obj_create: Object create ready
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_core\lv_obj.c#461: lv_obj_create: Object create ready
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_core\lv_obj.c#461: lv_obj_create: Object create ready
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_widgets\lv_label.c#165: lv_label_create: label created
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_core\lv_obj.c#461: lv_obj_create: Object create ready
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_widgets\lv_cont.c#121: lv_cont_create: container created
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_widgets\lv_btn.c#106: lv_btn_create: button created
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_core\lv_obj.c#461: lv_obj_create: Object create ready
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_widgets\lv_label.c#165: lv_label_create: label created
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_core\lv_obj.c#461: lv_obj_create: Object create ready
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_widgets\lv_cont.c#121: lv_cont_create: container created
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_widgets\lv_btn.c#106: lv_btn_create: button created
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_core\lv_obj.c#461: lv_obj_create: Object create ready
INFO: File: .pio\libdeps\esp32dev\lvgl@7.11.0\src\lv_widgets\lv_label.c#165: lv_label_create: label created

This is the callback function and the altered TFT_WriteBitmap

// write bitmap directly, line-by-line
void TFT_WriteBitmap(uint8_t* Bitmap, uint16_t X, uint16_t Y, uint16_t Width, uint16_t Height)
{
    Serial.print("Width:");Serial.print(Width);Serial.print(" Height:");Serial.println(Height);
    #define SCREEN_BITMAP_ADDR  0x00000000
    // calc base address
    uint32_t addr = SCREEN_BITMAP_ADDR + (Y * BYTES_PER_LINE) + (X * BYTES_PER_PIXEL);

    // can we do a fast full width block transfer?
    if(X == 0 && Width == EVE_HSIZE)
    {
        EVE_push_dma_buffer(addr, *Bitmap, (Height * BYTES_PER_LINE));
    }
    else
    {
        // line by line mode
        uint32_t bpl = Width * BYTES_PER_PIXEL;
        for (uint16_t i = 0; i < Height; i++)
        {
            EVE_push_dma_buffer(addr, *Bitmap + (i * bpl), bpl);
            addr += BYTES_PER_LINE;
        }
    }
}

// LittlevGL flush callback
void my_disp_flush(lv_disp_drv_t * drv, const lv_area_t * area, lv_color_t * color_map)
{
    TFT_WriteBitmap((uint8_t*)color_map, area->x1, area->y1, lv_area_get_width(area), lv_area_get_height(area));
}

I can see TFT_WriteBitmap is called once to draw 800 pixel width and 6 pixels height.

Do you have any idea on what is going wrong?

garageeks commented 2 years ago

EVE_LVGL_Test_ESP32_PlatformIO.zip

Here you have the whole package built from your latest master source and Arduino example, which worked before modifications.

davidjade commented 2 years ago

I just wanted to comment briefly here as the person who did the ESP32 SPI DMA port for LVGL (and much of the DMA rework in LVGL for the ESP32), at one point this was working quite well (in the FTxx bitmap mode it uses to work with LVGL). I was getting 12-14 FPS for full screen refreshes using DMA Quad SPI. But I have been away from that project for quite some time and they historically have a habit of breaking things since I don't think they test on all supported boards, etc... They did a big reorg a while back that I was not involved with. Unfortunately, I don't think I will have to time to revisit it anytime soon but I just wanted to point out that yes, it did work and was quite stable and running at the full speed of the DMA bus (and in my setup, actually a little but faster than the FTxx specs).

RudolphRiedel commented 2 years ago

Yes, it really has been a while, this was well before I added DMA support for ESP32 and I did it differently since I needed to keep everything working for all the other platforms.

Your code builds for me as well, I only have no setup I could test it with right now.

The function initForLvgl() is for the V4 version of my library, it can not work like this with V5. So removing EVE_start_cmd_burst() / EVE_end_cmd_burst() is correct, this is for DMA anyways. The "while (EVE_busy());" from line 332 needs to be active though. This changes the commands to individual SPI transfers with their own chip-select cycle.

When you are not using touch you can remove the EVE_cmd_dl(TAG(... lines.

My EVE_init_spi() already sets up two SPI configurations, one for DMA - EVE_spi_device. And one for non-DMA transfers - EVE_spi_device_simple. I suggest changing the one for DMA transfer to autoset / clear CS. Then you can clear devcfg.post_cb and remove the digitalWrite(EVE_CS, LOW); from EVE_push_dma_buffer(). Oh, yes, devcfg.queue_size is only setup for a single entry queue.

The lvgl config is here: https://github.com/lvgl/lvgl_esp32_drivers/blob/master/lvgl_tft/disp_spi.c disp_spi_add_device_with_speed()

Well, maybe use this one for the DMA transfers?

I can't figure out on the spot what the purpose of all the code there is.

davidjade commented 2 years ago

I can't look at code right now so I apologize if this isn't relevant, but I will just add that the last bit of work I did on this was to use more in-flight DMA transfers via a queue and that had a major increase on performance for ESP32 LVGL. But this of course required really tightly tuned ESP32-specific code using the IDF method of queuing DMA transfers, which also meant it had to be in control of the CS line since the transfers are all offloaded in the background. You can see a before/after demo of DMA queuing here using the LVGL demo: https://www.youtube.com/watch?v=HxO02oeiakw

Another thing I'll add is that while these ESP32 LVGL bitmaps transfers are fairly fast, the way LVGL works by sending blocks or a window of bitmap updates is not optimal since the FTxx bitmaps transfer didn't have a way to just update window of the full-screen bitmap. This meant I had to break up a block of updates into separate lines for many single DMA transfers and this is not optimal at all. It's the one missing piece in the FTxx low level API that would have made this really potentially much faster (if you can accept the tradeoffs of using the FTxx this way to use LVGL).

garageeks commented 2 years ago

@davidjade thank you for your insights. Wish I knew ESP-IDF... I'm too far entrenched in the Arduino environment :|

@RudolphRiedel I tried to retrieve all the parameters scattered around lvgl_esp32 and put together that SPI configuration they used, however it doesn't appear to work properly, the screen just blinks at init and stays black. With the regular software-driven CS the screen stays grey. I'm not worried right now at performance, first I'd like to see the bloody thing drawing something on screen :)

There is something not really clear to me in this function you suggested:

            void EVE_push_dma_buffer(uint32_t ftAddress, const uint8_t EVE_dma_buffer, uint32_t length)
            {
                spi_transaction_t EVE_spi_transaction = {0};
                digitalWrite(EVE_CS, LOW); // make EVE listen /
                EVE_spi_transaction.tx_buffer = &EVE_dma_buffer;
                EVE_spi_transaction.length = length * 8; // length is in byte but the transaction.length is in bits */
                EVE_spi_transaction.addr =ftAddress;
                spi_device_queue_trans(EVE_spi_device, &EVE_spi_transaction, portMAX_DELAY);
                EVE_dma_busy = 42;
            }

EVE_dma_buffer looks like a global array defined in EVE_target.h extern uint32_t EVE_dma_buffer[1025]; So by defining again EVE_dma_buffer in the function definition it seems we are just passing the EVE_dma_buffer array pointer to EVE_spi_transaction.tx_buffer Therefore I don't understand how LVGL-generated bitmap goes from my_disp_flush-> TFT_WriteBitmap (in src.ino) to the EVE_dma_buffer. Considering my limited knowledge, it seems to me the function is sending just an empty buffer. Am I wrong?

RudolphRiedel commented 2 years ago

Do you have a logic-analyzer so that you can check what actually is coming out on the SPI?

Edit: naming that EVE_dma_buffer was not correct, that was me thinking one step further and removing the original DMA functionality since it no longer is needed. So rename that to "buffer" or whatever to make sure it is pointing to the *Bitmap from lvgl.

davidjade commented 2 years ago

From memory, LVGL has it's own buffers (two) that you set up and tell it about that it then writes updates into and then it makes callbacks to let the "driver" know when to flush them to the hardware. LVGL will actively write into the second buffer while the first is being used for DMA flushing. In the LVGL demo code, you should see something like this for the update buffers:

static lv_color_t buf1[DISP_BUF_SIZE];
static lv_color_t buf2[DISP_BUF_SIZE];

The ESP32 LVGL driver also has a small buffer that is used for managing the SPI transfers as well as using a queue to keep many DMA transfers in-flight. The larger LVGL buffers need to sometimes be broken up into many separate DMA transfers since LVGL will use the buffer for a "window" or rectangular update region and the FTxx bitmaps have no direct way to work that way.

The basic hack I implemented for LVGL was to set up a display list with one fullscreen bitmap. Then I figured out I could write directly to the bitmap on the FTxx device using the commands to write directly to the FTxx memory. This bypasses all the FTxx display list commands and allows for direct LVGL memory to FTxx bitmap memory DMA transfers.

One thing I would suggest in getting it up and running that I found useful was to reduce the SPI clock speed to rule out that signal corruption is not the issue. Also start with dual SPI vs. jumping straight to quad. I also sometimes had issues with the FTxx setup and not delaying or waiting for the FTxx to be ready to start all these fast direct memory writes. I only had two devices to test on and I think I got somewhat lucky that the Quad SPI eventually ran on a breadboard at full speed without electrical interference.

Hope this helps.

garageeks commented 2 years ago

@RudolphRiedel I changed the function as follows, but still nothing displayed

            void EVE_push_dma_buffer(uint32_t ftAddress, const uint8_t *my_data, uint32_t length)
            {
                Serial.print("Pushing ");Serial.print(length);Serial.println(" bytes");
                spi_transaction_t EVE_spi_transaction = {0};
                digitalWrite(EVE_CS, LOW); // make EVE listen /
                EVE_spi_transaction.tx_buffer = my_data;
                EVE_spi_transaction.length = length * 8; // length is in byte but the transaction.length is in bits */
                EVE_spi_transaction.addr =ftAddress;
                spi_device_queue_trans(EVE_spi_device, &EVE_spi_transaction, portMAX_DELAY);
                EVE_dma_busy = 42;
            }

Unfortunately I have a simple scope with just two channels which I used to debug I2C and RS485 but I'm afraid we would need 4 channels to debug SPI. However your example works (see picture below) so hardware reliability is verified.

@davidjade interesting hack for FTxxx. Yes, to exclude hardware issues, I started from Rudolph Arduino+PlatformIO example and it works. 20220107_005253

RudolphRiedel commented 2 years ago

A 2 channel scope is fine as the question right now is if there even is traffic on the SPI for the lvgl buffers. So triggering on CS and reading data is fine to check if there even is something.

And I was not even thinking about QSPI, I never needed it so far and most of the architectures my library is supporting could not even use QSPI, only a fraction supports DMA. EVE is particular interesting for low-level controllers. But yes, in this case QSPI makes a whole lot more sense since the buffer for 800x480 has 750kiB.

Well, slow running would be an improvement over not running.

RudolphRiedel / FT800-FT813

Write bitmap regions of screen from buffer (LVGL) #35