bitbank2 / JPEGDEC

An optimized JPEG decoder for Arduino
Apache License 2.0
365 stars 47 forks source link

Missing columns in RGB565 but present in RGB8888 #72

Closed toneck closed 1 month ago

toneck commented 3 months ago

When trying to convert an image using the user provided buffer it works ok for the RGB8888, but has missing columns for RGB565. After some testing it happens when using the draw callback as well.

To Reproduce

  1. clone repo
  2. modify linux/main.c: ucPixelType = RGB565_BIG_ENDIAN //or little endian
  3. convert a file
  4. image with column mising image also some columns are switched. Notably columns that start at multiple of 128 on the x. Column 640->512 -> 384 -> 256 -> 128 ->0

Expected behavior A clear image as it appears with RGB8888 image

Additional context I was using your library to convert some images on an ESP32-cam since last year. (it was significantly faster than the provided library from Espressif) I needed RGB888 so I modified the code to produce RGB888 instead of RGB565. I noticed in January that you added RGB888 and tried the linux/main example. I saw you added the option of supplying a buffer from the user. Hoping to skip the drawcallback I tried your example on pc first and it worked. I wanted to test my code with rgb565 to see if I get a higher fps so I tried to change from RGB8888 to RGB565 and it kept giving me empty columns. I tried using the callback function to save the data to a buffer and save that to a file. The resulted file had no missing columns. I tried looking over the jpeg.inl to locate the problem, but I didn't notice anything wrong. I presumed I was doing something wrong so I stayed with the callback and RGB8888. I noticed the issue #69 and thought it may be similar, but the latest fix didn't work for me.

Today I decided to try using a callback function again, but now I see the columns when using the callback and when using the buffer. I'm not sure if before I did something different or if this is a new bug. To test that I didn't do anything wrong, I tried a few previous commits, using the same callback function that just saves everything into a buffer. All tests are done with the tulips header image. Commit e6293f0970527024f772adf455447a413e344777 works as intended. Commit 6dd03753c800a686c86fdf502bf4116890998bc5 segfaults after a few rows of decoding the image in RGB565. Commit f6a46aa42c5c7e131fb3389a1516876e8ec23ae8 outputs columns.

Also, there is no difference between RGB 565 little or big endian for the bugged commits, they both output little endian. This problem doesn't appear in e6293f0970527024f772adf455447a413e344777.

toneck commented 3 months ago

After trying some more commits, 2dbb61c8a30158e94924df04af60396c5680a72b already has this bug, which is the next commit after e6293f0970527024f772adf455447a413e344777 (which doesn't). Therefore the bug was introduced in 2dbb61c8a30158e94924df04af60396c5680a72b

bitbank2 commented 3 months ago

Are you running the code on an ESP32 or ESP32-S3? If the S3, then it looks like a memory alignment issue with the output buffer.

toneck commented 3 months ago

I also think it's an alignment issue.

I didn't have my esp on hand, so I ran it on my phone in termux (aarch64)

That being said, I am quite sure I had this problem on my laptop (64bit intel) about a month ago.

I'll be able to test it on the esp in about 2 hours

toneck commented 3 months ago

Ok, so, I was able to test it both on my laptop (amd64) and on my esp32. It appears to work on both of them. The problem seems to be only on aarch64. I am curious if it would appear on a raspberry pi.

bitbank2 commented 3 months ago

Thanks for narrowing it down; I'll investigate.

bitbank2 commented 3 months ago

Fixed the NEON code. Please confirm.

toneck commented 3 months ago

I didn't have time yesterday, but I managed to test it today. The columns are not missing anymore. Thanks for the help

On the other hand, the bug with the RGB565 not outputing big endian is still present. I did some tests and this bug affects just decoding at full scale. Decoding at half, quarter or eighth will output little endian for RGB565_LITTLE_ENDIAN and big endian for RGB565_BIG_ENDIAN Decoding at full size will always output little endian, even for RGB565_BIG_ENDIAN.

toneck commented 3 months ago

Should I close this bug and open a different bug for that one?

bitbank2 commented 3 months ago

I can fix that issue here too. I'll get to it today.

bitbank2 commented 3 months ago

Please give it a try now

toneck commented 3 months ago

Thank you for your help, unfortunately, it gives a compilation error

In file included from main.c:11:
../src/jpeg.inl:3565:26: error: incompatible type for argument 1 of ‘vrev16q_u8’
 3565 |               vrev16q_u8(u168Temp);
      |                          ^~~~~~~~
      |                          |
      |                          uint16x8_t
In file included from ../src/jpeg.inl:56,

in jpeg.inl:

          u88B = vqrshrun_n_s16(i168B, 4); // shift right, narrow and saturate >
          u168Temp = vshll_n_u8(u88R, 8); // place red in upper part of 16-bit >
          u168Temp2 = vshll_n_u8(u88G, 8); // shift green elements to top of 16>
          u168Temp = vsriq_n_u16(u168Temp, u168Temp2, 5); // shift green elemen>
          u168Temp2 = vshll_n_u8(u88B, 8); // shift blue elements to top of 16->
          u168Temp = vsriq_n_u16(u168Temp, u168Temp2, 11); // shift blue elemen>
          if (ucPixelType == RGB565_BIG_ENDIAN) { // reverse the bytes
              vrev16q_u8(u168Temp);
          }
          vst1q_u16((uint16_t *)pOutput, u168Temp); // top left block
          // top right block
          i168Temp = vqdmulhq_lane_s16(i168Crx2.val[1], i164Constants, 0); // C>
          i168R = vaddq_s16(i168Temp, i168Y); // now we have 8 R values
          i168Temp = vqdmulhq_lane_s16(i168Crx2.val[1], i164Constants, 1); // C>
          u88R = vqshrun_n_s16(i168R, 4); // narrow and saturate to 8-bit unsig>
          i168G = vaddq_s16(i168Y, i168Temp);
bitbank2 commented 3 months ago

interesting; on MacOS that doesn't give a warning or error. I'll fix it - thanks.

toneck commented 3 months ago

I think the problem is that the variable is declared as _uint16x8t uint16x8_t u168Temp, u168Temp2; it appears that _vrev16qu8 expects a _uint8x16t : link

bitbank2 commented 3 months ago

yes, of course I see the issue, but some compilers allow you to pass 128-bit registers without making a fuss about the implied type.

toneck commented 3 months ago

Here is the gcc that I use if it helps:

I use a chroot ubuntu environment in termux, this is the gcc verision that I use image

I tried the gcc compiler that is present in termux (not in the chroot) and it compiles, but the bigendian output is still little endian. image

bitbank2 commented 3 months ago

ok, I added the requisite vreinterpretq_xx_xx wrappers. Please let me know if your compiler accepts it now.

bitbank2 commented 3 months ago

Is the fix working for you?

toneck commented 1 month ago

Hi again, Sorry for the very late reply. I got busy and forgot.

I just checked all three formats RGB565 little and BIG endian and RGB888 for all scalings and they work as they should.

Thank you for your help