ethteck / splat

A binary splitting tool to assist with decompilation and modding projects
MIT License
159 stars 42 forks source link

gzip support #379

Closed ginryuoku closed 3 months ago

ginryuoku commented 3 months ago

At least with the PSY-Q tooling, it's possible to use GZip-encoded archives to embed data within the main binary. Polyphony's Gran Turismo 2 at the very least uses it, both to open its own overlays but also provide the two intro screens before either the MDEC or main menu overlays are triggered. I'd be genuinely shocked if there weren't other, non-Polyphony examples of titles that embed gzip artifacts.

I'm wondering if there'd be a way to specially handle gzip objects, either at the analysis stage, or at least be able to separate them out when splitting. gzip in particular has distinct header magic, and has the original file name as a possible field in the header, though it's annoying in that it doesn't provide a distinct epilogue beyond a CRC32 at the end of the file. I think this can be accomplished with an extension, but I'm not familiar enough with Python (or this codebase) to say for sure offhand.

I can probably just make a makefile rule to remove the *.bin extension, but I figure this might come up again. :)

ethteck commented 3 months ago

You can definitely handle custom file formats with an extension. Compression gets a bit tricky, because splat kinda assumes things have a consistent size throughout its runtime, but code can be compressed. The way people usually handle compressed code is to treat compressed segments as their own yaml and then link it by itself, eventually linking the compressed segment into the main project.

If your gzip data doesn't include code, you could probably get away without having multiple yamls. To make a splat segment, you usually would just want to implement the scan() and split() functions. Here's a brief example: https://github.com/pmret/papermario/blob/main/tools/splat_ext/pm_charset_palettes.py

You can do your decompression in scan() and actual writing to disk in split(), or you can just do everything insplit()`. Then you'll need some build system process to re-compress the data, of course.

ethteck commented 3 months ago

I forgot that we have a class you can extend for compressed stuff (CommonSegDecompressor). Here's an example that uses it: https://github.com/ethteck/splat/blob/main/src/splat/segtypes/n64/yay0.py

ginryuoku commented 3 months ago

That looks to be what I need, thank you. :) I'll see if I can whip up an extension that can scan for and delimit embedded gzip assets when I get a chance. I figured it was supposed to be something pretty simple but wasn't sure how to actually decompress (even if only for analysis purposes).