gitendo / lz4gb

LZ4 compression / decompression for GameBoy development.
Mozilla Public License 2.0
15 stars 1 forks source link

Possibly even smaller decompression func. #1

Closed slembcke closed 5 years ago

slembcke commented 5 years ago

I just started doing GameBoy dev coming from NES dev. On my last project I used the lz4 code that came with cc65 and it worked brilliantly. My first project for GB was to implement lz4, and was pretty proud that I got it down to 120 bytes (much smaller than the 6502 version that ships with cc65). Then I actually bothered to Google if somebody had already done this, and found that yours was even smaller. (Gah!) After incorporating your optimizations I was able to get my version down to 77 bytes and still read vanilla lz4 streams. That version is here: https://gist.github.com/slembcke/229ca5e17f78ecac1ade9c8efd5ff485

As I understand it, the modifications you made to the lz4 format were to use a 0 offset as a terminator. (Something I had also noticed all my files had. Related to this maybe? https://github.com/lz4/lz4/blob/dev/doc/lz4_Block_format.md#parsing-restrictions), and swapping the endian of the back ref offset (which was clever!).

With further format modifications, I think it's possible to get it even smaller. Specifically, moving the offset distance before the additional count increment bytes means that for both the literals and back ref, you could add the additional count bytes and copy in a single function call as well as remove some RAM accesses. My (untested) function is 64 bytes long and uses a single HRAM byte. I might try modifying your compressor to do this tomorrow.

At first I wasn't so sure about the GB CPU, but this is turning out to be kinda fun!

gitendo commented 5 years ago

Nice!

Yeah, LR35902 is fun. :) It's been a while since I coded it, so I don't remember exactly but I think I reversed the order - original compressor saves match distance and length and my modification match length and distance. This was necessary to fit everything within registers and not to use any memory additional push/pops. I probably also shortened offsets to 16 bits vs 32 in original - we don't have that much memory in Game Boy anyway. :)

Actually I also started with 6502 code base. This was to test porting code by using tips from Nintendo docs - nes2gbc.pdf to be exact. I realized quite soon that output is really sloppy and I could achieve better results writing it from scratch. I'm not a 6502 coder anyway. Then I took a peek at Z80 version which uses those fancy instructions not available for LR35902 and ended up with first version. I had couple of different solutions but it was where I started to count cycles and register usage instead of H/RAM. So the final result is a compromise between different approaches and my skills.