Ghidra analysis misinterprets the IP-relative x86 calls

Describe the bug Near call analysis fails if the result of adding the operand to the offset of the instruction's end results into 0. Instead it comes up with some unrelated address as the call target. For example

6ffe:0579 e8 84 fa

0x579+3 - (0xFFFF-0xfa84+1) = 0 but Ghidra jumps to 7e7c:181f

To Reproduce Steps to reproduce the behavior:

Analyze the attached st.exe with the default settings
Press G and enter 6ffe:0579

Expected behavior Ghidra does the arithmetic correctly, at least in the 16bit x86 code, where NULL pointers have legit code at them.

Attachments st.zip

Environment (please complete the following information):

OS: Windows 11 23H2
Java Version: Java(TM) SE Runtime Environment (build 17.0.7+8-LTS-224)
Ghidra Version: 11.1.1
Ghidra Origin: https://github.com/NationalSecurityAgency/ghidra/releases/tag/Ghidra_11.1.1_build

Additional context It doesn't appear often, so the current work around is to just override the reference. A simple Python script could be made to do sanity checks on all Ghidra resolved near calls. Apparently the bug is related to the segmented models, since in flat models NULL pointer is treated specially, so Ghidra doesn't expect calls going there.

Ok. I had near hundred of such near calls over severals functions I wrote two scripts (with the help of ChatGPT, which somehow has expert knowledge of Ghidra).

One script checks presence, and the other fixes them. Please be sure to backup your project, before running any such scripts.

#### This script checks if Ghidra misgenerated near call references ####

def ubytes(bs):
  return map(lambda b: b & 0xff, bs)

def check_near_calls():
  instructions = currentProgram.getListing().getInstructions(True)
  while instructions.hasNext():
      instruction = instructions.next()
      if instruction.getMnemonicString() == "CALL" and instruction.getDefaultOperandRepresentation(0).startswith("0x"):
        ibs = ubytes(instruction.getBytes())
        if ibs[0] == 0xE8:
          call_address = instruction.getAddress()
          refs = getReferencesFrom(call_address)
          for ref in refs:
            if ref.getReferenceType().toString() == "UNCONDITIONAL_CALL":
              adr = ref.getToAddress()
              seg = adr.getSegment()
              ofs = adr.getSegmentOffset()
              fadr = ref.getFromAddress()
              fseg = fadr.getSegment()
              fofs = fadr.getSegmentOffset()
              cseg = call_address.getSegment()
              cofs = call_address.getSegmentOffset()
              disp = ibs[2]*0x100 + ibs[1]
              proper_ofs = (fofs+3 + disp)&0xFFFF
              if ofs != proper_ofs:
                print("Call at {:04X}:{:04X}".format(fseg,fofs))
                print("  target is {:04X}:{:04X} but should be {:04X}:{:04X}"
                  .format(seg,ofs,cseg,proper_ofs))

check_near_calls()

#### This script fixes misgenerated near call references ####
from ghidra.program.model.symbol import RefType, SourceType

# Required to add references
reference_manager = currentProgram.getReferenceManager()

def ubytes(bs):
  return map(lambda b: b & 0xff, bs)

def fix_near_calls():
  instructions = currentProgram.getListing().getInstructions(True)
  while instructions.hasNext():
      instruction = instructions.next()
      if instruction.getMnemonicString() == "CALL" and instruction.getDefaultOperandRepresentation(0).startswith("0x"):
        ibs = ubytes(instruction.getBytes())
        if ibs[0] == 0xE8:
          call_address = instruction.getAddress()
          cseg = call_address.getSegment()
          cofs = call_address.getSegmentOffset()
          disp = ibs[2]*0x100 + ibs[1]
          proper_ofs = (cofs+3 + disp)&0xFFFF
          refs = getReferencesFrom(call_address)
          needs_fix = 0
          for ref in refs:
            if ref.getReferenceType().toString() == "UNCONDITIONAL_CALL":
              adr = ref.getToAddress()
              seg = adr.getSegment()
              ofs = adr.getSegmentOffset()
              fadr = ref.getFromAddress()
              fseg = fadr.getSegment()
              fofs = fadr.getSegmentOffset()
              if seg != cseg or ofs != proper_ofs: needs_fix = 1
          if needs_fix:
            # Likely all references are invalid
            for ref in refs: removeReference(ref)
            # Create the correct reference
            proper_adr = toAddr((cseg << 4) + proper_ofs)
            reference_manager.addMemoryReference(call_address, proper_adr, RefType.UNCONDITIONAL_CALL, SourceType.USER_DEFINED, 0)

fix_near_calls()

NationalSecurityAgency / ghidra

Ghidra analysis misinterprets the IP-relative x86 calls #6684