NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.57k stars 5.87k forks source link

MIPS16e: Incorrect PC-relative addressing in JALR/JALX/JR delay slots #862

Open NeedsMoreFlux opened 5 years ago

NeedsMoreFlux commented 5 years ago

Describe the bug When disassembling/decompiling a PC-relative LW instruction that is preceded by a JAL, JALX, or JR instruction in MIPS16e mode, the address that Ghidra computes for the LW instruction is incorrect. Specifically, Ghidra uses the address of the LW as the value for PC instead of the address of the preceding jump instruction when forming the load address.

To Reproduce

  1. Import the following hex file with the MIPS:BE:32:default language:
    :020000040000FA
    :100000001C000004B402E8A00000002AFFFFFFD695
    :0C0010000080102503E000080000000044
    :00000001FF
  2. Disassemble the bytes at address 0x0 as MIPS16e instructions by pressing F12. For reference, my listing displays the following:
                             //
                             // ram 
                             // Generated by Intel Hex
                             // ram: 00000000-0000001b
                             //
                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined FUN_00000000()
                               assume ISA_MODE = 0x1
                               assume PAIR_INSTRUCTION_FLAG = 0x0
             undefined         v0:1           <RETURN>
                             FUN_00000000
        00000000 1c 00 00 04     jalx       FUN_00000010                                     undefined FUN_00000010()
        00000004 b4 02           _lw        a0,0x8(pc)=>DAT_0000000c                         = FFFFFFD6h
        00000006 e8 a0           jrc        ra
        00000008 00              ??         00h
        00000009 00              ??         00h
        0000000a 00              ??         00h
        0000000b 2a              ??         2Ah    *
                             DAT_0000000c                                    XREF[1]:     FUN_00000000:00000004(R)  
        0000000c ff ff ff d6     undefined4 FFFFFFD6h
                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined FUN_00000010()
                               assume ISA_MODE = 0x0
                               assume PAIR_INSTRUCTION_FLAG = 0x0
             undefined         v0:1           <RETURN>
                             FUN_00000010                                    XREF[1]:     FUN_00000000:00000000(c)  
        00000010 00 80 10 25     or         v0,a0,zero
        00000014 03 e0 00 08     jr         ra
        00000018 00 00 00 00     _nop
  3. Notice that the LW instruction at address 0x4 generates a reference to the data at address 0xc (DAT_0000000c) instead of address 0x8. The decompiler window also shows the same erroneous reference to DAT_0000000c.

Expected behavior From the operation section of Load Word (PC-Relative) on page 104 of the MIPS16e Application Specific Extension to the MIPS32 Architecture, the base_pc value used to compute the load address should be that of the preceding instruction if the LW is in a jump delay slot. For the sample program, base_pc for the LW instruction should be 0x0, so the operand 0x8(pc) should refer to address 0x8 instead.

Environment (please complete the following information):

Additional context Part of the problem seems to be in the mips16.sinc SLEIGH file. I found that although the constructor for JAL sets ext_delay to 0b10 here, the constructors for JALR, JALX, and JR do not (although they still call globalset(inst_next, ext_delay)). By adding the appropriate assignments to ext_delay for these constructors (0b10 for JALX and 0b01 for JALR and JR), I was able to partially fix this bug, in that the decompilation would sometimes reference the correct address. However, I noticed that when clearing and redefining instructions containing the JALX/LW sequence, the listing display would sometimes update the reference for the LW instruction back to the incorrect address, and this new reference would then propagate to the decompiler somehow. For example, after making the ext_delay additions to mips16.sinc, if I load the test program from above and clear the JALX/LW instructions so that the listing looks like

                             undefined FUN_00000000()
                               assume ISA_MODE = 0x1
                               assume PAIR_INSTRUCTION_FLAG = 0x0
             undefined         v0:1           <RETURN>
                             FUN_00000000
        00000000 1c              ??         1Ch
        00000001 00              ??         00h
        00000002 00              ??         00h
        00000003 04              ??         04h
        00000004 b4              ??         B4h
        00000005 02              ??         02h
        00000006 e8 a0           jrc        ra
        00000008 00              ??         00h
        00000009 00              ??         00h
        0000000a 00              ??         00h
        0000000b 2a              ??         2Ah    *
        0000000c ff ff ff d6     undefined4 FFFFFFD6h

then the decompilation window shows

void FUN_00000000(void)

{
  FUN_00000010(uRam00000008);
  return;
}

with the correct address of 0x8 being passed to FUN_00000010. But if I then disassemble the JALX/LW bytes again, the listing displays

                             undefined FUN_00000000()
                               assume ISA_MODE = 0x1
                               assume PAIR_INSTRUCTION_FLAG = 0x0
             undefined         v0:1           <RETURN>
                             FUN_00000000
        00000000 1c 00 00 04     jalx       FUN_00000010                                     undefined FUN_00000010()
        00000004 b4 02           _lw        a0,0x8(pc)=>DAT_0000000c                         = FFFFFFD6h
        00000006 e8 a0           jrc        ra
        00000008 00              ??         00h
        00000009 00              ??         00h
        0000000a 00              ??         00h
        0000000b 2a              ??         2Ah    *
                             DAT_0000000c                                    XREF[1]:     FUN_00000000:00000004(R)  
        0000000c ff ff ff d6     undefined4 FFFFFFD6h

with the incorrect reference to address 0xc and the decompilation window shows

void FUN_00000000(void)

{
  FUN_00000010(DAT_0000000c);
  return;
}

I haven't figured out why this happens or any way to stop the incorrect reference from being generated other than by clearing the JALX/LW instruction bytes.

Since this bug seems to involve some instructions not setting ext_delay correctly, it's plausible that other instructions that rely on the value of ext_delay might also have buggy behavior when they are preceded by a JALR, JALX, or JR instruction.

emteere commented 5 years ago

Debugging the issue it appears that the context register for the re-parse of the instruction in the delayslot is not consistent with the initial parse. One subsequent parses, the context for the pcode of the second instruction is the context of the instruction containing the delay slot.

The correct address is generated by adding the following along with your changes to set the ext_delay on the delay_slot instruction:

:jalx Abs26_m16 is ISA_MODE=1 & RELP=1 & ext_isjal=1 & ext_tgt_x=1 & Abs26_m16
       [ext_delay=0b10;  ISA_MODE = 0; globalset(Abs26_m16, ISA_MODE); globalset(inst_next, ext_delay); globalset(inst_start,ext_delay);] {

However, this is not the correct fix. We'll take a look at the context parsing for pcode in the delay slot. It is possible this was changed recently.

PerikiyoXD commented 7 months ago

Bump

Also affects MIPS:LE:32:default Affected instructions: J, JAL

imagen imagen