NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.53k stars 5.78k forks source link

"Invalid segment start file location: ..." and "Failed to create ... memory block: Part of range ... already exists in memory." errors during MZ file import #5970

Closed haarlaj closed 4 months ago

haarlaj commented 10 months ago

Describe the bug I'm trying to import old DOS EXEs from late 80's to Ghidra. File imported recognizes the files as MZ x86 16bit Real mode files. When I try to import the files, Ghidra imports the file but shows lots of "Invalid segment start file location: " with negative addresses followed by "Failed to create ... memory block: Part of range ... already exists in memory." Ghidra seems to parse the MZ header and relocation table correctly but when the relocation table is applied something goes haywire.

To Reproduce Steps to reproduce the behavior:

  1. Download https://w140.com/tekwiki/images/8/88/063-0228-01.zip
  2. Import disk1/CATS.EXE or disk2/UTIL.EXE to ghidra using "Old-style DOS Executable (MZ)" and "x86:LE:16:Real Mode:default"
  3. Look to "Import Results Summary" -> "Additional Information"
  4. See various "Invalid segment start file location: " with negative values followed by Failed to create "CODE_xxx" memory block: Part of range (.....) already exists in memory."

Expected behavior Successful file import without errors

Screenshots cats_import

Environment (please complete the following information):

Additional context The software is an old measure equipment calibration software from 1989 for 80286/80386 and MS-DOS 3.1 or later.

haarlaj commented 5 months ago

Some additional observations about the problem.

I noticed that MemoryMap blocks start from CODE_42 so CODE_0 up to CODE_41 are missing completely.

MemoryMap

I compared this to the import error log and found that there are exactly 42 "Invalid segment start file location: " errors.

importlog

Just based on the code. Could there be some bug in how the segment offset is actually calculated? Missing segments are the first ones so is there some negative offset somewhere which offsets the first addresses to negative.

segmentFileOffset

https://github.com/NationalSecurityAgency/ghidra/blob/5e9cfe2419329094b6a83e9532f7a81f3e39126d/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/MzLoader.java#L201C1-L206C5

addressToFileOffset

https://github.com/NationalSecurityAgency/ghidra/blob/5e9cfe2419329094b6a83e9532f7a81f3e39126d/Ghidra/Features/Base/src/main/java/ghidra/app/util/opinion/MzLoader.java#L475C1-L478C1

And the second observation: The error for the "Failed to create.. ..memory block_ " lines all share the same start address.

haarlaj commented 5 months ago

I finally got the ghidra working in debug mode and got some actual numbers. As suspected the first 42 blocks result in negative addresses.

For first codeblock with negative address: segmentAddr 001d:000: offset 464 (0x1d0) segment 29 (0x1d)

(segmentAddr.getSegment() - INITIAL_SEGMENT_VAL) & 0xffff: (29 - 4096) & 0xffff = 61469

The addressToFileOffset function gets inputs as: segment = 61469, offset = 0. e_cparhdr from dos header is 1664 (0x0680) and paragrahsToBytes(header.e_cparhdr()) returns 26624

In the end addressToFileOffset is evaluated: (short) 61469 16 + 0 + 26624 = -38448 -4067 16 + 0 + 26624 = -38448

edit: I'm looking at the (short) cast in the addressToFileOffset. Should that be applied to in this case ((short) 61469) = -4067 or (short)(61469 * 16) = 464 (0x1d0)? The latter would result to the same as the offset in the segmentAddr.

(short) 61469 16 + 0 + 26624 = -38448 (short)(61469 16) + 0 + 26624 = 27088

edit2: I tried with my own build and seems to break things. The import crashes with "Address Overflow in add: ff3c:0000 0x3dd07". However, if I go through the imported handles more CODE blocks than before and doesn't complain about negative addresses. I'm thinking could this be something to do with casting uint16 addresses to int16 and the overflow / sign bit doing something unintended?

haarlaj commented 4 months ago

I'm pretty sure that the casting to short in addressToFileOffset is a bug. The function is used in three places where the segment is already kind of casted to uint16 using & 0xffff. The (short) will break the address if the segment is larger than 0x8000 as then it is processed as a negative input value. In this case the (short) segment should be replaced with (segment & 0xffff) but just the segment without casting would work fine too.

In my CATS.EXE file this introduces new errors in the processMemoryBlocks. The first SegmentedAddress 001d:0000 will wrap and return a addressToFileOffset 983504 (0xf01d0) which is larger than the file size. This will result for numBytes to be negative and MemoryBlockUtils.createUninitializedBlock throw an error "ghidra.program.model.address.AddressOutOfBoundsException: Address Overflow in subtract: 001d:0000 + 733448".

haarlaj commented 4 months ago

https://github.com/NationalSecurityAgency/ghidra/commit/d7c79988ea8d1c866c6d0572fb43f3e4482e141d#r141535077 related commit

ryanmkurtz commented 4 months ago

I am taking a look at this now...I gotta get my head back into it though...

ryanmkurtz commented 4 months ago

I think i agree with you now about the wrong cast to short in addressToFileOffset(). I honestly can't remember why I did that...it solved some problem at the time but I can't remember what. Fixing that has lead to some other issues that I need to continue debugging...probably tomorrow.

ryanmkurtz commented 4 months ago

I believe i fixed the issues described in this issue. If you have more issues with this binary, please open a new ticket and we can go from there.

haarlaj commented 4 months ago

Thanks. I'll do that if something comes up.