M-HT / SR

A project to statically recompile following games to create Windows or Linux (x86 or arm) versions of the games - Albion, X-Com: UFO Defense (UFO: Enemy Unknown), X-Com: Terror from the Deep, Warcraft: Orcs & Humans, Septerra Core: Legacy of the Creator, Battle Isle 3: Shadow of the Emperor
311 stars 18 forks source link

Thinking about recompiling a game or 2 but I need some info #70

Open MIvanchev opened 1 month ago

MIvanchev commented 1 month ago

Hey, can someone briefly tell me what is required to get SR going? I find very little info and would appreciate some help. I suppose the main problem to solve is which addresses in the code segment are code and data and the bunch of SCI files have to do with that and naming symbols in the input/output EXE. Maybe someone can put together a rough guide for me to follow?

M-HT commented 1 month ago

The process is complicated (more complicated than you think), it's time consuming and there's big chance of failure. If you don't have a really good reason to do it, then you probably shouldn't do it.

If that didn't discourage you, do you want to recompile a DOS game or a Windows game ?

MIvanchev commented 1 month ago

It's a DOS game and I'm aware of the complexity :) SR seems to automate a lot of the tasks involved so I'm just generally interested in the mechanics and the usage.

M-HT commented 1 month ago

I do the recompilation in several steps.

Step 1: Compile SR with OUTPUT_TYPE set to OUT_ORIG (in SR_defs.h). Apply SR to the DOS executable (no SCI files are needed) - SR source.exe destination.asm . Compile the generated assembly with nasm (targeting DOS) and link it. If everything works, that should generate a working DOS executable. No recompilation is done, but it tests several things.

Step 2: Compile SR with OUTPUT_TYPE set to OUT_DOS (in SR_defs.h). These SCI files are (can be) used in this step (empty file is the same as non-existing file):

Apply SR to the DOS executable (with SCI files) - SR source.exe destination.asm . Compile the generated assembly with nasm (targeting DOS) and link it. If everything works, that should generate a working DOS executable.

What are the SCI files used for ? _fixup_interpret_ascode.sci: If an address (in program) is in this SCI file, then it's interpreted as code - it will be disassembled, etc. You can generate a list of candidate addresses using SR --list_invalid_code_fixups=fixup_interpret_as_code.sci source.exe destination.asm . The list is unsorted and can contain duplicates. Sort it, remove duplicates and then remove addresses which are not code. That will give you the final SCI file (for this step).

_fixup_interpret_ascode.sci: If an address (in program) is in this SCI file, then it's interpreted as date - it won't be disassembled, etc. You can generate a list of candidate addresses using SR --list_data_to_code_fixups=fixup_do_not_interpret_as_code.sci source.exe destination.asm . The list is unsorted and can contain duplicates. Sort it, remove duplicates and then remove addresses which are code. That will give you the final SCI file (for this step).

_code16areas.sci: Some 32-bit DOS executables contain blocks of 16-bit code (i.e. real mode interrupt). You can define these blocks in this SCI file, so the recompiler doesn't try to disassemble them (as 32-bit code).

_noretprocedures.sci: This SCI file contains addresses of function that don't return (to the calling location). An example:

func1:
    call func2
    some data
func2:
    pop edi
    retn

In this example the address of func2 should be in this SCI file, so the recompiler doesn't interpret the data after call func2 as code.

_displacedlabels.sci: This is best described with an example:

mov bx, word [eax + addr1]

addr1:
    pop ebx
    retn
addr2:
some data

The first instruction was originally mov bx, word [eax + addr2 - 2]. It's reading from some data (the value of eax is 2 or higher), but the address in the instruction points to code and not to data. This works in the DOS executable because instructions pop ebx and retn are both 1 byte long. But when these instructions are recompiled their length is different and the first instruction would be reading wrong data. In this example the address addr1 should be in this SCI file, displaced by 2, so the recompiler interprets the address as addr2 - 2.

Other problems: Some executables contain crazy code like jumping in the middle of another instruction. This is not supported by the recompiler - I handle it by patching the DOS executable.

There are more steps following. I'll describe them when you get there with your recompilation.

MIvanchev commented 1 month ago

Thank you, this is quite detailed, it's enough to get me started for sure (I already have some experience with static recompilation). My main question is whether the end result (16->32 EXE) handles stuff like keyboard interrupts and VGA buffers using a library like SDL.

M-HT commented 1 month ago

The recompiler only supports 32-bit DOS executables, not 16-bit executables. The recompiler doesn't handle inputs (mouse/keyboard) or outputs (video/audio). You will have to handle that yourself - that's part of the later steps.