NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.94k stars 5.81k forks source link

8051: Decompiler memory region confusion #6074

Open depili opened 9 months ago

depili commented 9 months ago

Describe the bug It seems that the decompiler isn't able to differentiate the various memory regions properly. This is evident as:

  1. Clicking on references in the decompiler view jumps to progmem, no matter if the reference is in intram or extram
  2. typedefs set to point to INTRAM or extram do not produce correct labels in the decompiler and seem to have no effect at all
  3. The decompiler thinks that some register indirect addressed adds can never produce a carry, probably by looking at values in progmem addresses instead of unknown ram values

Environment (please complete the following information):

Additional context Screenshot 2024-01-01 at 17 11 45 Shows a 32bit addition chain with carry, for some reason ghidra thinks the final carry can never be set. The sleigh definitions seem to be ok, the instruction info popup shows only the ADDC altering PSW, and MOV not doing anything to it.

Screenshot 2024-01-01 at 17 13 08 Shows a function call with extmem pointers, the references are set in listing view to extmem and the function parameters are set to typedef byte *:8 with address space set to EXTMEM

Screenshot 2024-01-01 at 17 12 20 Another case of pointer misslabeling in the decompiler view

From what I can tell the sleigh definitions seem to be correct, this might be a decompiler bug, maybe relating to the different address spaces and 8 bit pointers?

depili commented 9 months ago

Screenshot 2024-01-01 at 18 26 26

Adding that in case of adding an immediate value the carry seems to be processed correctly by the decompiler.

depili commented 9 months ago

The ADDC instructions are defined at https://github.com/NationalSecurityAgency/ghidra/blob/master/Ghidra/Processors/8051/data/languages/8051_main.sinc#L660-L663 and there are no differences in flag handling between immediate and register indirect addressing in the sleigh code.

depili commented 9 months ago

Digging deeper this appears to be caused by the pcode operations dropping the memory region information, patching the processor definitions with:

macro carry8(op1, op2) {
  tmp1:1 = op1 +1;
  tmp2:1 = op2 -1;
  CY_flag = (carry(tmp1,tmp2));
}

Instead of just calls to carry() results in a messy but ultimately proper decompilation: Screenshot 2024-01-02 at 10 25 44

depili commented 8 months ago

Following minimal program illustrates this bug, it seems to require a register indirect addition to trigger:

Intel hex, dissassemble from 0x0 and look into function at 0x0c, decompiler thinks that code at 0x18 is never reached:

:100000007590807591FF759280110C227890E6248E
:0C00100080F608E43640012275940B22B3
:00000001FF

The assembly language source:

    ORG     0000h

start:
    MOV 90h, #80h
    MOV 91h, #0FFh
    MOV 92h, #80h
    CALL    add1
    RET
add1:
    MOV R0, #90h
    MOV A, @R0
    ADD A, #80h
    MOV @R0, A
    INC R0
    CLR A
    ADDC    A, @R0
    JC  .carry
    RET
.carry:
    MOV 94h, #11
    RET