felipesanches / AnotherWorld_VMTools

Toolchain for software development targeting the virtual machine originally designed for Eric Chahi's Another World game.
21 stars 2 forks source link

Where does the MSDOS engine (and other releases) store the text strings? #15

Open felipesanches opened 2 years ago

felipesanches commented 2 years ago

We need to figure it out and then to write a script to generate a pair of str_data.rom and str_index.rom files (which AWVM_trace.py uses to place comments with the string contents on the generated disasm listings).

These files will also be used on the FPGA project at: https://github.com/felipesanches/AnotherWorld_FPGA

@fabiensanglard's VM (https://github.com/fabiensanglard/Another-World-Bytecode-Interpreter) keeps the strings hardcoded. For this AW-VMTools project we want to extract absolutely everything from the original game files, instead.

felipesanches commented 2 years ago

I have a hunch that the text strings may be encoded in one of these places:

felipesanches commented 2 years ago

But it seems resource 0x11 has polygon data (https://github.com/fabiensanglard/Another-World-Bytecode-Interpreter/issues/23)

Also we see resource 0x11 being loaded as "video2" by my AnotherWorldVM driver on MAME: https://github.com/felipesanches/mame/blob/d7ea76b1b731b69fa9cb4d7c34e31fdd3e0f8333/src/mame/drivers/another_world_vm.cpp#L232-L233 Screenshot from 2022-03-25 07-49-32

So perhaps the text strings are indeed in the ANOTHER.EXE file.

felipesanches commented 2 years ago

For the record, the hardcoded string data is available at: https://github.com/fabiensanglard/Another-World-Bytecode-Interpreter/blob/dea6914a82f493cb329188bcffa46b9d0b234ea6/src/staticres.cpp#L123-L265 Screenshot from 2022-03-25 08-02-06

And there's also hardcoded font data which we also don't know where/how it was stored on the original game files. On @fabiensanglard's VM it is declared at: https://github.com/fabiensanglard/Another-World-Bytecode-Interpreter/blob/dea6914a82f493cb329188bcffa46b9d0b234ea6/src/staticres.cpp#L71-L120 Screenshot from 2022-03-25 08-03-09

felipesanches commented 2 years ago

@toymak3r, this is an easier task on some releases. I know that for the "SEGA Genesis - Europe" release, the string data is uncompressed within the ROM (can be easily seen with the strings unix command) so it can be a good initial target, before trying to figure out decompression of data on other releases such as in the SNES cartridge ROM.

felipesanches commented 2 years ago

And also the MSDOS release seems to involve some sort of compression (or obfuscation) which is also more challenging than starting with the releases that were shipped with raw text string data.

An interesting caveat is that some releases seem to provide multiple sets of text strings (for supporting the game in multiple languages, instead of only English)

felipesanches commented 2 years ago

Fun fact discovered by c9d4618:

It seems that the SEGA Genesis Europe cartridge has a typo in the string "SURE ?" (missing the letter E), while the same text on the MSDOS release does not have that typo.

Screenshot from 2022-04-06 01-29-33

felipesanches commented 2 years ago

And also the MSDOS release seems to involve some sort of compression

Yes! Strings in the MSDOS release are in the ANOTHER.EXE file which is compressed using something similar to LZSS, but I haven't yet fully decoded it. This is work in progress. I'm not yet sure how those 16-bit control words work (EC8F, 1F07, 807F, etc):

photo_2022-04-11_22-34-10