NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.23k stars 5.84k forks source link

Support for Borland FBOV / VROOMM overlays for DOS MZ EXE #5543

Open colinbourassa opened 1 year ago

colinbourassa commented 1 year ago

It would be nice to get support for the "FBOV" overlay extension for MZ EXEs. Also known as "VROOMM" (Virtual Run-time Object-Oriented Memory Manager), this was a feature provided by the Borland C/C++ toolchain from v2.0. It allowed for dynamic swapping of code segments to get around the 640K memory barrier. This approach was popular only briefly, as extenders such as DOS/4G soon became available. A section in Chapter 1 of the Borland C++ 4.0 reference manual describes the general design of VROOMM.

I've attached a sample executable in this ZIP file: nomad.zip The .exe can be loaded by the existing Ghidra MZ loader, but with the following message: File contains 0x29390 extra bytes starting at file offset 0x3e9c0.

This extra data is actually the entire FBOV section from the file, which starts with this header:

char[4] fbovMagic; // always "FBOV" -- should be used for detection
uint32 ovrsize;
uint32 exeinfo;
int32 segnum;

In the sample file, ovrsize is the size in bytes of the FBOV section excluding its header (0x29380). The segnum field is the total number of segments known to the overlay mechanism (0xa5). These segments are described by an array of structs at the file offset given by the exeinfo field. Borland gives this array the internal name _SEGTABLE_ in OVERLAY.LIB. In the sample, _SEGTABLE_ starts at 0x2e100 (which is actually in the middle of a segment containing the other VROOMM data.)

Each entry in _SEGTABLE_ is an 8-byte structure:

uint16 seg;
uint16 maxoff;
uint16 flags;
uint16 minoff;

The first few entries in the segment table of nomad.exe are:

  0, 21e2, 1, 0
21e, 5214, 1, 4  // loaded by IDA as seg001
73f, 1043, 1, 4  // loaded by IDA as seg002
843, ba,   1, 4  // loaded by IDA as seg003
84e, cb2,  1, a  // loaded by IDA as seg004

IDA Freeware 5.0 supports MZ/FBOV and can be used as a reference for desired behavior. In the case of nomad.exe, the code from the 0x29380 extra bytes at the end of the file is divided into overlay segments ovr075 through ovr153.

dclxviclan commented 1 year ago

Thanks for most powerfull and grate tool for free expirience reverse engineering✋👍

NancyAurum commented 4 months ago

Hey! I made an utility, which pre-relocates the FBOV exes, so they could be loaded into Ghidra: https://github.com/NancyAurum/devroomm/blob/main/tools/mzap/mzap.c

These exes will only work with Ghidra's analyzer, since IDA Pro gets confused by the cleared relocation table and doesn't resolve the far jumps properly.

As of now it can't handle the multisection FB file (beside FBOV there can be other FB sections) and ignores the function-reference relocation flag, since the exe I'm working on doesn't have them, so I can't test it. The overlay manager itself does the following to fixup the function references:

#define X86_MOD          0xF8
#define X86_RM           0x07
#define X86_MOV_IMM16    0xB8
#define X86_PUSH         0x50
void __OvrFixupFunref(uint16_t rseg, uint8_t *p) {
  //p - pointer at the fixup location

  //segment inside which the fixup location does a function call
  uint8_t *q = SEG2PTR(rseg);

  //Ensure our fixup site loads a far pointer, like the following
  //mov reg1,seg
  //push reg1
  //mov reg2,ofs
  //push reg2
  //Check the MOD part of the MODRM byte
  //it does some opcode magic, since 0x50 is both `push AX`
  //and when masked by 0xF8 is a general `push <register>` group
  //Same with 0xB8, which  is both `mov AX,<imm16>` and an opcode group
  if ((p[-1]&X86_MOD) != X86_MOV_IMM16) return; //moves segment?
  if ((p[ 2]&X86_MOD) != X86_PUSH) return;
  if ((p[-1]&X86_RM) != (p[2]&X86_RM)) return; //same RM operand?

  //now check the offset part...
  if (p[-1] != p[3]) return; //both are `move <reg>,IMM`?
  if (p[ 2] != p[6]) return; //both are `push <reg>`?

  uint16_t fofs = *(uint16_t*)&p[4]; //get raw offset of the function

  uint16_t rofs = sizeof(bosh_t);

  //go through the botrp_t entries
  for ( ; fofs != *(uint16_t*)&q[rofs+2]; rofs += 5);

  *(uint16_t*)&p[4] = rofs; //relocate the offset part of the far function ptr
}
NancyAurum commented 4 months ago

Hey! I made an utility, which pre-relocates the FBOV exes, so they could be loaded into Ghidra: https://github.com/NancyAurum/devroomm/blob/main/tools/mzap/mzap.c

Just realized that instead of relocating, one can update the MZ relocation table and header to include the FBOVs segments, in addition to untrapping the trap segments. That way the resulting exe is properly loaded by both Ghidra and IDA, although IDA still detects it as a Borland overlayed exe and offers to load an external .ovr file (only Turbo Pascal 5.0 supported these), even though mzap erases the FBOV id after merge. Would be still nice if Ghidra does that properly because the FBOV SEGTABLE has proper segment starts and ends, for both normal and overlayed segments.

GeReV commented 2 months ago

If this helps, I wrote a simple script that appends overlays from a selected executable to the program and wires them up: https://github.com/GeReV/ghidra_scripts/blob/main/LoadBorlandOverlays.java.

This is based almost entirely on @NancyAurum's work (thank you!).

It's currently just a simple script that asks for an executable explicitly because I'm working from a raw memory dump; Haven't sat down to learn how to integrate it as a loader or with analysis yet.

Side note, I got this working in my case, which required adding 0x2 to some segment so the correct traps are wired up. I'm not too sure why that was necessary (probably a bad calculation I missed), if anyone knows what I missed, please let me know. :)

Update: I fixed an issue with my script where added overlays could span two segments (i.e. some function could span from 6000:ffd0 to 7000:0020), breaking relative calls and jumps in Ghidra.

ryanmkurtz commented 2 months ago

Anyone have any official header files for these structures?

NancyAurum commented 2 months ago

The SE.ASM was published as part of Turbo Pascal 7.01, it has the official field names for the overlay stub table header: https://pastecode.io/s/7tr67h6q

; Overlay header record

ovSignature EQU (WORD PTR 0)
ovSaveReturn    EQU (WORD PTR 2)
ovFilePos   EQU (WORD PTR 4)
ovCodeSize  EQU (WORD PTR 8)
ovFixupSize EQU (WORD PTR 10)
ovJumpCount EQU (WORD PTR 12)
ovLink      EQU (WORD PTR 14)
ovSegment   EQU (WORD PTR 16)
ovRetryCount    EQU (WORD PTR 18)
ovNext      EQU (WORD PTR 20)
ovEmsPage   EQU (WORD PTR 22)
ovEmsOffset EQU (WORD PTR 24)
ovUserData  EQU (BYTE PTR 26)
ovVectors   EQU (BYTE PTR 32)
ovRecSize   EQU 32

There are also different versions of this overlay manager (the BC++ 3 one the most advanced) and Borland executable could have several sections beside the overlay one, so one shouldn't assume it is the single present. I have decompiled the loader code: https://github.com/NancyAurum/devroomm/blob/main/src/ovrman.c In fact, the overlays (and other sections) could be stored separately from exe, as .OVR file.