bitbank2 / AnimatedGIF

An optimized GIF decoder suitable for microcontrollers and PCs
Apache License 2.0
370 stars 51 forks source link

Ways to improve speed on esp32-s3 #78

Closed silverchris closed 9 months ago

silverchris commented 10 months ago

Hey, Just wondering if there are any changes or optimizations I could easily use to speed up things on an esp32-s3, with 2MB of psram? My output is a 480x480 16bit RGB interfaced display.

I am currently hitting about 6.8FPS

Thanks, I appreciate any pointers you can give :)

tobozo commented 10 months ago

6.8fps = 1000/6.8 = ~147 ms = 147000 uS per frame one frame = 480 480 2 = 460800 bytes

transferring 460800 bytes within 147000 uS = 460800/147000 = ~3Mhz SPI clock speed

note: this formula doesn't take the decoding time into account so the numbers may need to be tweaked to validate your current bus speed; if 3MHz isn't even close to your settings then you should check for other bottlenecks e.g. SD card or other SPI device dragging the bus down, or you may increase the bus speed: some parallel displays can work at 40MHz without a glitch

reading your GIF file on esp32-s3 from the psram or a huge LittleFS partition will always be faster than from the SD card

silverchris commented 10 months ago

I should have specified, that I am not using the SPI bus. The display is 16bit RGB, and the SDcard is interfaced via 1bit,20mhz MMC (I would have liked more bits, but I ran out of IO pins!). I should double check the bandwidth from the SD card, though, as I did that awhile back and it seemed like it would be fine... The data rate to the LCD is pretty disconnected in my setup, as the GIF decoder is just writing to a region of memory, and the ESP32-S3s DMA controller is handling getting that to the display, which does result in some tearing, but that is to be solved later :)

On Tue, Jan 9, 2024 at 4:05 PM tobozo @.***> wrote:

6.8fps = 1000/6.8 = ~147 ms = 147000 uS per frame one frame = 4804802 = 460800 bytes

transferring 460800 bytes within 147000 uS = 460800/147000 = ~3Mhz SPI clock speed

note: this formula doesn't take the decoding time into account so the numbers may need to be tweaked to validate your current bus speed; if 3MHz isn't even close to your settings then you should check for other bottlenecks e.g. SD card or other SPI device dragging the bus down, or you may increase the bus speed: some parallel displays can work at 40MHz without a glitch

reading your GIF file on esp32-s3 from the psram or a huge LittleFS partition will always be faster than from the SD card

— Reply to this email directly, view it on GitHub https://github.com/bitbank2/AnimatedGIF/issues/78#issuecomment-1883790867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFW6Z3YN4CXAPK25IDWWW3YNWWKNAVCNFSM6AAAAABBTWUN5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBTG44TAOBWG4 . You are receiving this because you authored the thread.Message ID: @.***>

tobozo commented 10 months ago

if your display isn't using the SPI bus, what bus is it using?

[edit] 1 bit transfer with SD card at 20MHz is definitely a bottleneck, LittleFS will be way faster than that

silverchris commented 10 months ago

16bit parallel RGB interface.

I did notice that I could double the SDMMC interface speed, so it's now at 40mhz instead of 20, which has bumped me to about 7.1fps, I don't think it's the limiting factor? Though I could be wrong

On Tue, Jan 9, 2024 at 5:11 PM tobozo @.***> wrote:

if your display isn't using the SPI bus, what bus is it using?

— Reply to this email directly, view it on GitHub https://github.com/bitbank2/AnimatedGIF/issues/78#issuecomment-1883883706, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFW6Z4LR537KDJS22A75QTYNW6ARAVCNFSM6AAAAABBTWUN5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBTHA4DGNZQGY . You are receiving this because you authored the thread.Message ID: @.***>

tobozo commented 10 months ago

it may be the limiting factor when the SD card is low quality, does it still work when you set the SD bus at 80MHz ?

also what's the bus speed for the parallel display? if you're not using a breadboard and low quality wires it may be worth picking values between 16MHz and 20MHz

silverchris commented 10 months ago

I believe the SDMMC speed is limited to 40mhz on the ESP32-S3 hardware.

The write speed to the display won't affect the speed that the gif is decoded at, as it is just a memcpy to another block of memory, so it's only limited by the speed the esp32 can read and write to PSRAM

On Tue, Jan 9, 2024 at 5:38 PM tobozo @.***> wrote:

it may be the limiting factor when the SD card is low quality, does it still work when you set the SD bus at 80MHz ?

also what's the bus speed for the parallel display? if you're not using a breadboard and low quality wires it may be worth picking values between 16MHz and 20MHz

— Reply to this email directly, view it on GitHub https://github.com/bitbank2/AnimatedGIF/issues/78#issuecomment-1883911245, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFW6Z7IGVLCW534NDEZAV3YNXBGLAVCNFSM6AAAAABBTWUN5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBTHEYTCMRUGU . You are receiving this because you authored the thread.Message ID: @.***>

tobozo commented 10 months ago

esp32-s3 can do 80MHz with sdio

still with esp32-s3, some psram modules can do 120MHz.

so for optimal transfer speed, your display bus speed should be close to 1/16th of the psram speed, and a dividend of the max SPI speed (e.g. 8MHz for QSPI Psram or 16MHz for OPI Psram)

bitbank2 commented 10 months ago

A big speed up can come from using the output image as the dictionary. On a PC, this speeds things up quite a bit, but needs lots of RAM. I can try running this on systems with PSRAM to see if there is a speed advantage. This is code I haven't shared yet, but might be worth sharing for special cases like this.

silverchris commented 10 months ago

esp32-s3 can do 80MHz with sdio

still with esp32-s3, some psram modules can do 120MHz.

so for optimal transfer speed, your display bus speed should be close to 1/16th of the psram speed, and a dividend of the max SPI speed (e.g. 8MHz for QSPI Psram or 16MHz for OPI Psram)

I am running the flash/PSRAM at 120mhz QIO (Using an ESP32-SFH4R2)

I would have to make changes to the board to change anything about the SDMMC interface, as I don't have it wired to support SPI mode. A board update may happen, though if I really need too :)

silverchris commented 10 months ago

A big speed up can come from using the output image as the dictionary. On a PC, this speeds things up quite a bit, but needs lots of RAM. I can try running this on systems with PSRAM to see if there is a speed advantage. This is code I haven't shared yet, but might be worth sharing for special cases like this.

I would be interested in hearing the results! I was looking at your giflib-turbo earlier, but I believe that needs to have all frames in memory to work, which isn't useful for me, as I think I have a fair amount of memory, but not that much!

silverchris commented 9 months ago

Had any more thoughts on this? I did find that if I loaded the image from internal flash rather than the SDMMC interface, I save about 60milliseconds a frame, so I have some tweaking to do there myself, but I am hoping for a bit more :)

silverchris commented 6 months ago

@bitbank2 just noticed the turbo branch! Will try that out. Thank you!

silverchris commented 6 months ago

On GIFs I can get it to play, it does perform a lot better!

Unfortunately, I run into issues with a lot of GIFs. This GIF for examples hangs after rendering the first frame @bitbank2 ezgif-2-c5da18bad7

Also, another bug is that the DecodeLZWTurbo doesn't pass the pUser data to the draw function

bitbank2 commented 6 months ago

Can you be more specific about the word "hang"? That doesn't tell me if your program stopped because my library returned an error or if the processor got stuck on an exception. Can you provide some details?

silverchris commented 6 months ago

Can you be more specific about the word "hang"? That doesn't tell me if your program stopped because my library returned an error or if the processor got stuck on an exception. Can you provide some details?

In this case, no error is returned. The device locks up until the watchdog triggers and resets it.

I will try and put together a minimal example that causes this behavior today

bitbank2 commented 6 months ago

I created an Arduino sketch and am able to recreate the problem. This is a good situation to learn how to use the single-step debugging of the S3. I'll let you know what I find.

bitbank2 commented 6 months ago

I couldn't get the S3 debugging to work on my Mac. On Xcode using the same methods, it succeeds

silverchris commented 6 months ago

I got a bit of a backtrace.

It looks like something is invoking the panic handler, and that is ending up in an endless loop

(gdb) backtrace
#0  panic_handler (frame=0x3fcc47f0, pseudo_excause=true) at /home/silverchris@ad.silverchris.ca/esp/esp-idf/components/esp_system/port/panic_handler.c:145
#1  0x4037a3e3 in panicHandler (frame=0x3fcc47f0) at /home/silverchris@ad.silverchris.ca/esp/esp-idf/components/esp_system/port/panic_handler.c:217
#2  0x4037a0db in xt_highint4 () at /home/silverchris@ad.silverchris.ca/esp/esp-idf/components/esp_system/port/soc/esp32s3/highint_hdl.S:108
#3  0x40040025 in ?? ()
#4  0x42016629 in DecodeLZWTurbo (pImage=0x3fcc7ab4, iOptions=<optimized out>) at /home/silverchris@ad.silverchris.ca/esp32badge_software/components/image/EmbeddedImage/AnimatedGIF/src/gif.inl:1055
#5  0x42017290 in AnimatedGIF::playFrame (this=0x3fcc7ab4, bSync=false, delayMilliseconds=0x3fcc4984, pUser=0x3fcc4988) at /home/silverchris@ad.silverchris.ca/esp32badge_software/components/image/EmbeddedImage/AnimatedGIF/src/AnimatedGIF.cpp:261
#6  0x420160aa in GIF::loop (this=0x3fcc7ab0, outBuf=0x3c141480 "\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\316\034\316\034\316\034\336}\336}\336}\336}\336}\336}\336}\336}\336}\336}\336}\336}\336}\316\034\316\034\336}\316\034\316\034\336}\316\034\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\316\034\276\034\276\034\276\034\275\232\275\232\275\232\265\232\265\232\265\232\265\232\265\232\265\232\265\232\265\232\265\232\276\034\276\034\275\232\265\232\265\232\265\232\265\232\276\034\276\034\275\232\276\034\276\034\275\232\276\034\276\034\276\034\275\232\265\232\275\232\275\232\265\232\265\232\265\232\265\232\265\232\255X\255X\255X\255X\255X\255X\255X\255X\255X\255X"...) at /home/silverchris@ad.silverchris.ca/esp32badge_software/components/image/EmbeddedImage/gif.cpp:118
#7  0x4200a231 in image_loop (in=std::shared_ptr<Image> (use count 1, weak count 0) = {...}, pGIFBuf=0x3c141480 "\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\316\034\316\034\316\034\336}\336}\336}\336}\336}\336}\336}\336}\336}\336}\336}\336}\336}\316\034\316\034\336}\316\034\316\034\336}\316\034\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\306\036\316\034\276\034\276\034\276\034\275\232\275\232\275\232\265\232\265\232\265\232\265\232\265\232\265\232\265\232\265\232\265\232\276\034\276\034\275\232\265\232\265\232\265\232\265\232\276\034\276\034\275\232\276\034\276\034\275\232\276\034\276\034\276\034\275\232\265\232\275\232\275\232\265\232\265\232\265\232\265\232\265\232\255X\255X\255X\255X\255X\255X\255X\255X\255X\255X"..., display=std::shared_ptr<Display> (use count 2, weak count 0) = {...}) at /home/silverchris@ad.silverchris.ca/esp32badge_software/main/display.cpp:133
#8  0x4200c25c in display_task (params=0x3fcc0a64) at /home/silverchris@ad.silverchris.ca/esp32badge_software/main/display.cpp:356
#9  0x4038e94c in vPortTaskWrapper (pxCode=0x4200b698 <display_task(void*)>, pvParameters=0x3fcc0a64) at /home/silverchris@ad.silverchris.ca/esp/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:134
bitbank2 commented 6 months ago

I see you're trying to use the "Turbo" mode. This requires a lot more memory than the normal decode. Can you share the part of your code which uses all of the AnimatedGIF function calls?

silverchris commented 6 months ago

I got a little further. Set CONFIG_ESP_SYSTEM_SINGLE_CORE_MODE in menuconfig and now it actually finishes the panic handler.

Guru Meditation Error: Core  / panic'ed (Cache disabled but cached memory region accessed).                                                                                                                                                                                                                             
MMU entry fault error occurred while accessing the address 0x3cb75748 (invalid mmu entry)                                                                                                                                                                                                                               

Core  0 register dump:                                                                                                                                                                                                                                                                                                  
PC      : 0x42032223  PS      : 0x00060e34  A0      : 0x82032706  A1      : 0x3fccef90                                                                                                                                                                                                                                  
A2      : 0x3c602cc8  A3      : 0x00032e51  A4      : 0x3c63e86a  A5      : 0x3c64086c                                                                                                                                                                                                                                  
A6      : 0x000000d0  A7      : 0x3fccef90  A8      : 0x3fccefac  A9      : 0x3fccefac                                                                                                                                                                                                                                  
A10     : 0x3c635b19  A11     : 0x3fcc6270  A12     : 0x8039a03c  A13     : 0x3fcc6220                                                                                                                                                                                                                                  
A14     : 0x00000000  A15     : 0x3fcccde0  SAR     : 0x00000001  EXCCAUSE: 0x00000007                                                                                                                                                                                                                                  
EXCVADDR: 0x00000000  LBEG    : 0x40056fc5  LEND    : 0x40056fe7  LCOUNT  : 0x00000000                                                                                                                                                                                                                                  

Backtrace: 0x42032220:0x3fccef90 0x42032703:0x3fccefe0 0x420337e1:0x3fccf0a0 0x42030e05:0x3fccf0e0 0x42012ef9:0x3fccf120 0x42014913:0x3fccf190 0x40399bca:0x3fccf350                                                                                                                                                    

ELF file SHA256: 830b50a46a233bd7 
silverchris commented 6 months ago

I see you're trying to use the "Turbo" mode. This requires a lot more memory than the normal decode. Can you share the part of your code which uses all of the AnimatedGIF function calls?

I should have plenty of memory for this

Heap summary for capabilities 0x00000400:                                                                                                                                                                                                                                                                               
  At 0x3c130000 len 7143424 free 6191428 allocated 949508 min_free 6190816                                                                                                                                                                                                                                              
    largest_free_block 6160372 alloc_blocks 11 free_blocks 3 total_blocks 14                                                                                                                                                                                                                                            
  Totals:                                                                                                                                                                                                                                                                                                               
    free 6191428 allocated 949508 min_free 6190816 largest_free_block 6160372          

This is my class that wraps it for my api.

#include "gif.h"
#include <string>
#include "bitbank2.h"

#include <cstdio>
#include <sys/stat.h>
#include <cstring>
#include <esp_heap_caps.h>

struct mem_buf {
    uint8_t *buf;
    FILE *fp;
    uint8_t *pos;
    uint8_t *read;
    size_t size;
};

static void *OpenFile(const char *fname, int32_t *pSize) {
    FILE *infile = fopen(fname, "r");
    setvbuf(infile, nullptr, _IOFBF, 4096);
    struct stat stats{};

    if (fstat(fileno(infile), &stats) != 0) {
        return nullptr;
    }

    *pSize = stats.st_size;

  heap_caps_print_heap_info(MALLOC_CAP_SPIRAM);
  auto *mem = static_cast<mem_buf *>(malloc(sizeof(mem_buf)));
    if(stats.st_size <= heap_caps_get_largest_free_block(MALLOC_CAP_SPIRAM)){
      mem->buf = static_cast<uint8_t *>(heap_caps_malloc(stats.st_size, MALLOC_CAP_SPIRAM));
    }
    else {
      printf("Not enough memory to buffer image\n");
      mem->buf = nullptr;
    }
    if (mem->buf == nullptr) {
    }
    mem->fp = infile;
    mem->pos = mem->buf;
    mem->read = mem->buf;
    mem->size = stats.st_size;

    if (infile) {
        return mem;
    }
    return nullptr;
}

static void CloseFile(void *pHandle) {
    auto *mem = (mem_buf *) (pHandle);
    fclose(mem->fp);
    if (mem->buf != nullptr) {
        free(mem->buf);
    }
    free(mem);
}

static int32_t ReadFile(bb2_file_tag *pFile, uint8_t *pBuf, int32_t iLen) {
    if (iLen <= 0) {
        return 0;
    }
    int32_t iBytesRead;
    iBytesRead = iLen;
    auto *mem = (mem_buf *) (pFile->fHandle);
    if (mem->buf == nullptr) {
        iBytesRead = (int32_t) fread(pBuf, 1, iBytesRead, mem->fp);
        pFile->iPos = ftell(mem->fp);
        return iBytesRead;
    }
    if (mem->pos + iLen > mem->read) {
        //We don't have enough in the buffer, read and add it
        size_t bytes = fread(mem->read, 1, (mem->pos + iLen) - mem->read, mem->fp);
        int32_t len = mem->read - mem->pos + bytes;
        mem->read += bytes;
        memcpy(pBuf, mem->pos, len);
        mem->pos += len;
        pFile->iPos = mem->pos - mem->buf;
        return len;
    } else {
        //Already in the buffer, copy it out
        memcpy(pBuf, mem->pos, iLen);
        mem->pos += iLen;
        pFile->iPos = mem->pos - mem->buf;
        return iLen;
    }
}

static int32_t SeekFile(bb2_file_tag *pFile, int32_t iPosition) {
    auto *mem = (mem_buf *) (pFile->fHandle);
    if (mem->buf == nullptr) {
        fseek(mem->fp, iPosition, SEEK_SET);
        pFile->iPos = (int32_t) ftell(mem->fp);
        return pFile->iPos;
    } else {
        mem->pos = mem->buf + iPosition;
        pFile->iPos = iPosition;
        return iPosition;
    }
}

GIF::GIF() = default;

GIF::~GIF() {
    printf("GIF DELETED\n");
    gif.freeFrameBuf(free);
    gif.freeTurboBuf(free);
    gif.close();
}

uint8_t *tmpbuf;

int GIF::loop(uint8_t *outBuf) {
    GIFUser gifuser = {outBuf, width};
    tmpbuf = outBuf;
    int frameDelay;
    if (gif.playFrame(false, &frameDelay,  (void *) &gifuser) == -1) {
        printf("GIF Error: %i\n", gif.getLastError());
        return -1;
    }
//  memcpy(outBuf, gif.getFrameBuf(), gif.getCanvasWidth()*gif.getCanvasHeight()*2);
    return frameDelay;
//    return gif->playFrame(false, nullptr, (void *) &gifuser);
}

std::pair<int, int> GIF::size() {
    return {gif.getCanvasWidth(), gif.getCanvasHeight()};
}

void GIF::GIFDraw(GIFDRAW *pDraw) {
    int y;
    uint16_t *d;
//
    auto *gifuser = static_cast<GIFUser *>(pDraw->pUser);
    auto *buffer = (uint16_t *) gifuser->buffer;
//
//    y = pDraw->y; // current line
    d = &buffer[(pDraw->y * gifuser->width)];
    memcpy(d, pDraw->pPixels, pDraw->iWidth * 2);
//printf("X: %d Y: %d d: %p\n", pDraw->iX, pDraw->y, d);
}

Image *GIF::create() {
    return new GIF();
}

void *GIFAlloc(uint32_t u32Size) {
    return heap_caps_malloc(u32Size, MALLOC_CAP_SPIRAM);
} /* GIFAlloc() */

typedef int32_t (*readfile)(GIFFILE *pFile, uint8_t *pBuf, int32_t iLen);

typedef int32_t (*seekfile)(GIFFILE *pFile, int32_t iPosition);

int GIF::open(const char *path) {
    gif.begin(BIG_ENDIAN_PIXELS);
    if (gif.open(path, OpenFile, CloseFile, (readfile) ReadFile, (seekfile) SeekFile, GIFDraw)) {
        gif.allocFrameBuf(GIFAlloc);
      gif.allocTurboBuf(GIFAlloc);
      gif.setDrawType(GIF_DRAW_COOKED);
        width = gif.getCanvasWidth();
//        int height = gif.getCanvasHeight();
//        pTurboBuffer = (uint8_t *)heap_caps_malloc(TURBO_BUFFER_SIZE + (width*height), MALLOC_CAP_8BIT);
//        if(pTurboBuffer) {
//          gif.setTurboBuf(pTurboBuffer);
//        }
//        else{
//          printf("Allocating turbo buffer failed\n");
//        }
      return 0;
    }
    return -1;
}

std::string GIF::getLastError() {
    switch (gif.getLastError()) {
        case GIF_SUCCESS:
            return "GIF_SUCCESS";
        case GIF_DECODE_ERROR:
            return "GIF_DECODE_ERROR";
        case GIF_TOO_WIDE:
            return "GIF_TOO_WIDE";
        case GIF_INVALID_PARAMETER:
            return "GIF_INVALID_PARAMETER";
        case GIF_UNSUPPORTED_FEATURE:
            return "GIF_UNSUPPORTED_FEATURE";
        case GIF_FILE_NOT_OPEN:
            return "GIF_FILE_NOT_OPEN";
        case GIF_EARLY_EOF:
            return "GIF_EARLY_EOF";
        case GIF_EMPTY_FRAME:
            return "GIF_EMPTY_FRAME";
        case GIF_BAD_FILE:
            return "GIF_BAD_FILE";
        case GIF_ERROR_MEMORY:
            return "GIF_ERROR_MEMORY";
        default:
            return "Unknown";
    }
}
silverchris commented 6 months ago

OK, after some simplifiying the hard lockup/cache error must have been my fault in some of my code.

After cleaning up my code to this, I now don't crash, but get a decoding error after 2 frames

#include "gif.h"
#include <string>
#include "bitbank2.h"

#include <cstdio>
#include <sys/stat.h>
#include <cstring>
#include <esp_heap_caps.h>

static void *OpenFile(const char *fname, int32_t *pSize) {
    FILE *infile = fopen(fname, "r");
    setvbuf(infile, nullptr, _IOFBF, 4096);
    struct stat stats{};

    if (fstat(fileno(infile), &stats) != 0) {
        return nullptr;
    }

    *pSize = stats.st_size;
    return infile;
}

static void CloseFile(void *pHandle) {
    fclose((FILE *)pHandle);
}

static int32_t ReadFile(bb2_file_tag *pFile, uint8_t *pBuf, int32_t iLen) {
    if (iLen <= 0) {
        return 0;
    }
    int32_t iBytesRead;
    iBytesRead = iLen;
    iBytesRead = (int32_t) fread(pBuf, 1, iBytesRead, (FILE *)pFile->fHandle);
    pFile->iPos = ftell((FILE *)pFile->fHandle);
    return iBytesRead;
}

static int32_t SeekFile(bb2_file_tag *pFile, int32_t iPosition) {
    fseek((FILE *)pFile->fHandle, iPosition, SEEK_SET);
    pFile->iPos = (int32_t) ftell((FILE *)pFile->fHandle);
    return pFile->iPos;
}

GIF::GIF() = default;

GIF::~GIF() {
    printf("GIF DELETED\n");
    gif.freeFrameBuf(free);
    gif.freeTurboBuf(free);
    gif.close();
}

uint8_t *tmpbuf;

int GIF::loop(uint8_t *outBuf) {
//  heap_caps_print_heap_info(MALLOC_CAP_SPIRAM);
  GIFUser gifuser = {outBuf, width};
    tmpbuf = outBuf;
    int frameDelay;
    if (gif.playFrame(false, &frameDelay,  (void *) &gifuser) == -1) {
        printf("GIF Error: %i\n", gif.getLastError());
        return -1;
    }
    return frameDelay;
}

std::pair<int, int> GIF::size() {
    return {gif.getCanvasWidth(), gif.getCanvasHeight()};
}

void GIF::GIFDraw(GIFDRAW *pDraw) {
    uint16_t *d;
    auto *gifuser = static_cast<GIFUser *>(pDraw->pUser);
    auto *buffer = (uint16_t *) gifuser->buffer;
    d = &buffer[(pDraw->y * gifuser->width)];
    memcpy(d, pDraw->pPixels, pDraw->iWidth * 2);
}

Image *GIF::create() {
    return new GIF();
}

void *GIFAlloc(uint32_t u32Size) {
    return heap_caps_malloc(u32Size, MALLOC_CAP_SPIRAM);
} /* GIFAlloc() */

typedef int32_t (*readfile)(GIFFILE *pFile, uint8_t *pBuf, int32_t iLen);

typedef int32_t (*seekfile)(GIFFILE *pFile, int32_t iPosition);

int GIF::open(const char *path) {
    gif.begin(BIG_ENDIAN_PIXELS);
    if (gif.open(path, OpenFile, CloseFile, (readfile) ReadFile, (seekfile) SeekFile, GIFDraw)) {
      gif.allocFrameBuf(GIFAlloc);
      gif.allocTurboBuf(GIFAlloc);
      gif.setDrawType(GIF_DRAW_COOKED);
      width = gif.getCanvasWidth();
      return 0;
    }
    return -1;
}

std::string GIF::getLastError() {
    switch (gif.getLastError()) {
        case GIF_SUCCESS:
            return "GIF_SUCCESS";
        case GIF_DECODE_ERROR:
            return "GIF_DECODE_ERROR";
        case GIF_TOO_WIDE:
            return "GIF_TOO_WIDE";
        case GIF_INVALID_PARAMETER:
            return "GIF_INVALID_PARAMETER";
        case GIF_UNSUPPORTED_FEATURE:
            return "GIF_UNSUPPORTED_FEATURE";
        case GIF_FILE_NOT_OPEN:
            return "GIF_FILE_NOT_OPEN";
        case GIF_EARLY_EOF:
            return "GIF_EARLY_EOF";
        case GIF_EMPTY_FRAME:
            return "GIF_EMPTY_FRAME";
        case GIF_BAD_FILE:
            return "GIF_BAD_FILE";
        case GIF_ERROR_MEMORY:
            return "GIF_ERROR_MEMORY";
        default:
            return "Unknown";
    }
}
bitbank2 commented 6 months ago

This is getting a bit complicated. I have a feeling it may be because you're using PSRAM. My code doesn't know you're using PSRAM and without proper cache flush, the random accesses (beyond 8K apart) that my code is doing is probably messing it up.

silverchris commented 6 months ago

Hmmm, I am not sure if I can fit it in IRAM, as I would need 480*480+24k of memory, correct? Which would be somewhere around 275kb of ram?

bitbank2 commented 6 months ago

I can try debugging it on my Linux laptop tomorrow. Maybe I'll have better luck with the S3's CDC-Jtag debugging.

silverchris commented 6 months ago

With some squeezing, and disabling a lot of the other code on the device, I managed to shove it into the internal memory. Still gives a decoding error after two frames, just as a point of data :)

bitbank2 commented 6 months ago

Have you tested the original image using the non-turbo mode? I tested this situation on the ESP32 and turbo mode gets you no benefit because any extra speed is lost due to PSRAM delays.

silverchris commented 6 months ago

I have tested it with the non-turbo code, and it does play back fine there, just slow.

One the one gif I did get turbo working on, it was about 10-20ms faster per frame, even from PSRAM. Though I do have the octal PSRAM running at 120mhz

Does that GIF play back on your device in turbo mode, without decode failures, with the code you have published? I haven't been able to get that working even when it's in internal RAM, it always decode failures after two frames

silverchris commented 6 months ago

Does this example work? https://github.com/bitbank2/AnimatedGIF/blob/master/examples/turbo_t_qt_example/turbo_t_qt_example.ino It looks like that without a frame buffer allocated, it will never call the GIF_DRAW_CALLBACK function, due to https://github.com/bitbank2/AnimatedGIF/blob/4a42a942003e2f697deae5faf59b6718b7b5159e/src/gif.inl#L1204 and then nothing gets rendered to the screen?

bitbank2 commented 6 months ago

GIF_DRAW_COOKED requires a framebuffer to work. The presence of a callback function (pfnDraw) with cooked mode requires that same framebuffer to be allocated. If you select cooked mode and don't provide a framebuffer, then nothing will get drawn.

silverchris commented 6 months ago

After some more playing around. I can seem to get turbo mode to work, but only with 128 colour GIFs, the same file in 256 colour decode failures

This is with the turbo buffer allocated in internal memory, and not PSRAM

bitbank2 commented 6 months ago

Can you share an Arduino example and I'll take a look.

silverchris commented 6 months ago

Here is a minimal example. https://github.com/silverchris/gif_minimal

For it to open the GIF, MAX_WIDTH in AnimatedGIF.h must be changed to 480

bitbank2 commented 6 months ago

I tried to use your example, but it's missing all of the PCA_xxx macros. I notice that it allocates the framebuffer in each loop, but doesn't free it. It also does everything in PSRAM. There shouldn't be any difference between decoding a 128 and 256 color image. What exactly is the failure you're seeing? We seem to be going in circles on this issue. There are some problems using PSRAM to hold certain types of buffers because the caching doesn't always behave perfectly without explicit flushes. At this point, I've already lost the narrative on what the actual problem you're seeing is.

Let's start with a smaller image that fits completely in static RAM on a non-PSRAM display. Have you successfully used Turbo mode in that scenario?

silverchris commented 6 months ago

My RGB LCD is configured over SPI, and the SPI is bit banged over a GPIO IC, due to a lack of pins, that's probably why you are missing the macros?

The allocating the framebuf each loop was an oversight, but doesn't effect the end result here, as we don't make it through a full loop.

earth_128x128.h works fine.

with rick.h I get

Beginning
Initialized!
Successfully opened GIF; Canvas size = 480 x 480
Gif Error 1

PXL_20240514_232843757 MP

I actually took it a step further, and removed all the graphics code from my minimal example, as I realize you probably don't have the exact hardware that I do.

It now is for sure allocating from internal memory, and still fails the exact same way

#include <AnimatedGIF.h>
#define GIF_NAME rick
#include "../test_images/earth_128x128.h"
#include "rick.h"

uint8_t *pTurboBuffer;

AnimatedGIF gif;
int iOffX, iOffY;

void setup(void)
{  
  Serial.begin(115200);
  // while (!Serial) delay(100);

  Serial.println("Beginning");
  // Init Display

  gif.begin(GIF_PALETTE_RGB565_LE); // Set the cooked output type we want (compatible with SPI LCDs)
}

void GIFDraw(GIFDRAW *pDraw)
{
} /* GIFDraw() */

void loop()
{
  long lTime;
  int iFrames, iFPS;

// Allocate a buffer to enable Turbo decoding mode (~50% faster)
// it requires enough space for the full "raw" canvas plus about 32K workspace for the decoder
  pTurboBuffer = (uint8_t *)heap_caps_malloc(TURBO_BUFFER_SIZE + (480*480), MALLOC_CAP_INTERNAL);

  while (1) { // loop forever
     // The GIFDraw callback is optional if you use Turbo mode (pass NULL to disable it). You can either
     // manage the transparent pixels + palette conversion yourself or provide a framebuffer for the 'cooked'
     // version of the canvas size (setDrawType to GIF_DRAW_FULLFRAME).
      if (gif.open((uint8_t *)GIF_NAME, sizeof(GIF_NAME), GIFDraw)) {
      Serial.printf("Successfully opened GIF; Canvas size = %d x %d\n", gif.getCanvasWidth(), gif.getCanvasHeight());
      gif.setDrawType(GIF_DRAW_COOKED); // We want the library to generate ready-made pixels
      gif.setTurboBuf(pTurboBuffer); // Set this before calling playFrame()

      iOffX = (480 - gif.getCanvasWidth())/2; // center on the display
      iOffY = (480 - gif.getCanvasHeight())/2;
      lTime = micros();
      // Decode frames until we hit the end of the file
      // false in the first parameter tells it to return immediately so we can test the decode speed
      // Change to true if you would like the animation to run at the speed programmed into the file
      iFrames = 0;
      while (gif.playFrame(false, NULL)) {
        iFrames++;
        if(gif.getLastError() != 0){
          Serial.printf("Gif Error %i\n", gif.getLastError());
          while(1){}
        }
      }
      lTime = micros() - lTime;
      iFPS = (iFrames * 10000000) / lTime; // get 10x FPS to make an integer fraction of 1/10th 
      Serial.printf("total decode time for %d frames = %d us, %d.%d FPS\n", iFrames, (int)lTime, iFPS/10, iFPS % 10);
      gif.close(); // You can also use gif.reset() instead of close() to start playing the same file again
    }
  } // while (1)
}
bitbank2 commented 6 months ago

On my mac I was able to see the failure occur while decoding the 3rd frame. The LZW buffer highwater value was too low for this particular image. Change it from:

define LZW_HIGHWATER_TURBO ((LZW_BUF_SIZE_TURBO * 15) / 16)

to

define LZW_HIGHWATER_TURBO ((LZW_BUF_SIZE_TURBO * 14) / 16)

I was able to reproduce the bug exactly on my mac and now it appears to be resolved.

silverchris commented 6 months ago

AWESOME. That got it working. Now I am bumping up against my display refresh rate, as the GIF decoding has been as fast as 45ms a frame, which is awesome.

I still have an issue with one complex GIF behaving the same way, but I can't' share it. Is there any limits on how large the turbo buffer can be, or could I increase it if I have free memory to test?

bitbank2 commented 6 months ago

Increasing the Turbo buffer size doesn't have any effect, it's a fixed size. What I changed above is when the main decode loop checks if more compressed data needs to be read. Part of the speed advantage of my code is that I'm not checking for "end_of_input" after decoding each symbol.