NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
49.16k stars 5.66k forks source link

Ghidra analysis misinterprets the IP-relative x86 calls #6684

Open NancyAurum opened 4 days ago

NancyAurum commented 4 days ago

Describe the bug Near call analysis fails if the result of adding the operand to the offset of the instruction's end results into 0. Instead it comes up with some unrelated address as the call target. For example

6ffe:0579 e8 84 fa

0x579+3 - (0xFFFF-0xfa84+1) = 0 but Ghidra jumps to 7e7c:181f

To Reproduce Steps to reproduce the behavior:

  1. Analyze the attached st.exe with the default settings
  2. Press G and enter 6ffe:0579

Expected behavior Ghidra does the arithmetic correctly, at least in the 16bit x86 code, where NULL pointers have legit code at them.

Attachments st.zip

Environment (please complete the following information):

Additional context It doesn't appear often, so the current work around is to just override the reference. A simple Python script could be made to do sanity checks on all Ghidra resolved near calls. Apparently the bug is related to the segmented models, since in flat models NULL pointer is treated specially, so Ghidra doesn't expect calls going there.

NancyAurum commented 3 days ago

Ok. I had near hundred of such near calls over severals functions I wrote two scripts (with the help of ChatGPT, which somehow has expert knowledge of Ghidra).

One script checks presence, and the other fixes them. Please be sure to backup your project, before running any such scripts.

#### This script checks if Ghidra misgenerated near call references ####

def ubytes(bs):
  return map(lambda b: b & 0xff, bs)

def check_near_calls():
  instructions = currentProgram.getListing().getInstructions(True)
  while instructions.hasNext():
      instruction = instructions.next()
      if instruction.getMnemonicString() == "CALL" and instruction.getDefaultOperandRepresentation(0).startswith("0x"):
        ibs = ubytes(instruction.getBytes())
        if ibs[0] == 0xE8:
          call_address = instruction.getAddress()
          refs = getReferencesFrom(call_address)
          for ref in refs:
            if ref.getReferenceType().toString() == "UNCONDITIONAL_CALL":
              adr = ref.getToAddress()
              seg = adr.getSegment()
              ofs = adr.getSegmentOffset()
              fadr = ref.getFromAddress()
              fseg = fadr.getSegment()
              fofs = fadr.getSegmentOffset()
              cseg = call_address.getSegment()
              cofs = call_address.getSegmentOffset()
              disp = ibs[2]*0x100 + ibs[1]
              proper_ofs = (fofs+3 + disp)&0xFFFF
              if ofs != proper_ofs:
                print("Call at {:04X}:{:04X}".format(fseg,fofs))
                print("  target is {:04X}:{:04X} but should be {:04X}:{:04X}"
                  .format(seg,ofs,cseg,proper_ofs))

check_near_calls()
#### This script fixes misgenerated near call references ####
from ghidra.program.model.symbol import RefType, SourceType

# Required to add references
reference_manager = currentProgram.getReferenceManager()

def ubytes(bs):
  return map(lambda b: b & 0xff, bs)

def fix_near_calls():
  instructions = currentProgram.getListing().getInstructions(True)
  while instructions.hasNext():
      instruction = instructions.next()
      if instruction.getMnemonicString() == "CALL" and instruction.getDefaultOperandRepresentation(0).startswith("0x"):
        ibs = ubytes(instruction.getBytes())
        if ibs[0] == 0xE8:
          call_address = instruction.getAddress()
          cseg = call_address.getSegment()
          cofs = call_address.getSegmentOffset()
          disp = ibs[2]*0x100 + ibs[1]
          proper_ofs = (cofs+3 + disp)&0xFFFF
          refs = getReferencesFrom(call_address)
          needs_fix = 0
          for ref in refs:
            if ref.getReferenceType().toString() == "UNCONDITIONAL_CALL":
              adr = ref.getToAddress()
              seg = adr.getSegment()
              ofs = adr.getSegmentOffset()
              fadr = ref.getFromAddress()
              fseg = fadr.getSegment()
              fofs = fadr.getSegmentOffset()
              if seg != cseg or ofs != proper_ofs: needs_fix = 1
          if needs_fix:
            # Likely all references are invalid
            for ref in refs: removeReference(ref)
            # Create the correct reference
            proper_adr = toAddr((cseg << 4) + proper_ofs)
            reference_manager.addMemoryReference(call_address, proper_adr, RefType.UNCONDITIONAL_CALL, SourceType.USER_DEFINED, 0)

fix_near_calls()