deltabeard / Peanut-GB

A Game Boy (DMG) emulator single header library written in C99. Performance is prioritised over accuracy.
https://projects.deltabeard.com/peanutgb/
276 stars 35 forks source link

Provide double width 4bpp output to lcd_draw_line #107

Open ccawley2011 opened 1 month ago

ccawley2011 commented 1 month ago

By packing the palette into a single nibble and expanding it when the palette register as is, it's possible to output double-width 4bpp graphics data without much overhead. This is useful on RISC OS for rendering in modes 12 or 27, where the resulting buffer can be copied directly to the screen as-is.

This overlaps with the changes in the glfw_gles2 and sdl2-swrenderer branches, so it should probably be combined with one of them before merging.

ccawley2011 commented 1 month ago

This has been made optional, so it should be ready for review.

deltabeard commented 1 month ago

Thank you for this! Have you run any benchmarks before and after these changes on RISC OS using peanut-benchmark to verify the speed improvement? I'll have a thorough look at this pull request when I can. 👍

ccawley2011 commented 1 month ago

Thank you for this! Have you run any benchmarks before and after these changes on RISC OS using peanut-benchmark to verify the speed improvement? I'll have a thorough look at this pull request when I can. 👍

I did some tests with this, and enabling PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE plus using memcpy is noticeably faster on a StrongARM RiscPC compared to expanding each byte in lcd_draw_line.

deltabeard commented 1 month ago

enabling PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE plus using memcpy is noticeably faster on a StrongARM RiscPC

I'm trying to understand the usefulness of PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE. If each byte given to lcd_draw_line consists of same two nibbles for a pixel, how are you using memcpy to increase the performance of lcd_draw_line? I think that PEANUT_GB_USE_NIBBLE_FOR_PALETTE is only required for an 8-bit paletted bitmap that only uses 12 colours which can be copied using memcpy as was done in the glfw_gles2 branch. Is PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE supposed to allow for pushing two pixels at a time to a 4-bit paletted texture?

Would you kindly be able to provide example code as to how PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE is supposed to be used please?

ccawley2011 commented 1 month ago

Is PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE supposed to allow for pushing two pixels at a time to a 4-bit paletted texture?

That's essentially it - the only difference is that since the RiscPC only has a basic display controller rather than a full GPU, there are limited options for scaling the final image.

I've put my current code on GitHub if you want to take a look. It's a bit primitive at the moment, but it can run games in mode 12 (640x256x4, with aspect ratio correction using older monitors) and mode 27 (640x480x4, with lines doubled in software). I may add support for mode 28 as well (640x480x8) for compatibility with newer devices like the Raspberry Pi.

https://github.com/ccawley2011/Peanut-GB-riscos

deltabeard commented 1 month ago

That's essentially it - the only difference is that since the RiscPC only has a basic display controller rather than a full GPU, there are limited options for scaling the final image.

I see now. So the double width allows for easy scaling the output image to 320x288 by making taking advantage of the fact that the output is a 4-bit palette. It might also then be possible to have the output as a 2-bit palette for scaling to 640x576 (albeit with less colours), but that can be added later if anyone is interested. Without PEANUT_GB_USE_DOUBLE_WIDTH_PALETTE defined, the output is technically an 8-bit pixel, even though only 12 colours are ever used at the moment (this may change when CGB support is added).

I'll do some light testing when I get the time (should be this week) and then I'll merge. 🙂