NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.42k stars 5.76k forks source link

PIC-16 Processor/Analyser not propagating RP flags #3239

Closed peterbelm closed 1 month ago

peterbelm commented 3 years ago

Issue When reverse engineering a PIC-16 processor binary most SFR/GPR references do not get adjusted based on the RP bits in STATUS, and do not get interpreted as address references - even if the address is correct (i.e. RP0/1 are both 0 anyway)

All code in this issue is based on the attached example PIC-16 binary.

To Reproduce Steps to reproduce the behavior:

  1. Import PIC-16 binary (see example attached)
  2. Run automatic analysis
  3. Look for register file 'F' instructions that do not reflect previous setting/clearing of RP0/1

Expected output

       CODE:000e 83 12           BCF        STATUS,RP0
       CODE:000f 03 13           BCF        STATUS,RP1
       CODE:0010 8d 1e           BTFSS      DAT_DATA_000d,#0x5

With labelling:

       CODE:000e 83 12           BCF        STATUS,RP0
       CODE:000f 03 13           BCF        STATUS,RP1
       CODE:0010 8d 1e           BTFSS      PIR2,#0x5

Actual output

       CODE:000e 83 12           BCF        STATUS,RP0
       CODE:000f 03 13           BCF        STATUS,RP1
       CODE:0010 8d 1e           BTFSS      0xd,#0x5

Attempted fix There are two components to this issue - the PIC-16 processor SLEIGH spec and the PIC-16 Constant Propagation analyser.

I've tried debugging the Constant Propagation analyser and found that in this code:

RegisterValue rValue = context.getRegisterValue(reg);
if (rValue == null) {
    return;
}

'rValue' seems to be null on every instruction of interest (both with my modifications described below and with the original PIC-16 SLEIGH spec).

While investigating the SLEIGH spec I attempted to fix the issue by switching the RP bits into a SLEIGH context variable (see attached pic16_instructions.diff). This produced a dramatic improvement - allowing Ghidra to create references for all file register based addresses. However the propagation of RP bits breaks down in some flow situations like the following example where RP is set to 2 at CODE:0d31-2 yet at the target of the GOTO (CODE:0d34 -> CODE:0d40) the RP bits are not taken into account.

Inspecting the instruction info at CODE:0d40 shows that the rp variable is not even included as input:

Instruction Summary
-------------------
Mnemonic          : BTFSS
Number of Operands: 2
Address           : CODE:0d40
Flow Type         : FALL_THROUGH
Fallthrough       : CODE:0d41
Delay slot depth  : 0
Hash              : 55ba551d

Input Objects:
   DATA:0057, const:0x0, const:0x1
Result Objects:
   SkipNext
Constructor Line #'s:
   BTFSS(pic16_instructions.sinc:551), bit(pic16_instructions.sinc:241), 
   srcREG(pic16_instructions.sinc:145)

Byte Length : 2
Instr Bytes : 11010111 00011100
Mask        : 00000000 00111100
Masked Bytes: 00000000 00011100

Instr Context:
   doPseudo(00,00)     == 0x0

I attempted to solve this by specifically setting the globalcontext on the target of the GOTO with [ globalset(absAddr11,rp); ] but this had no effect.

Expected output

             assume PCLATH = 0xd
                             LAB_CODE_0d31                                   XREF[1]:     CODE:0d2d(j)  
       CODE:0d31 83 12           BCF        STATUS,RP0
             assume PCLATH = <UNKNOWN>
       CODE:0d32 03 17           BSF        STATUS,RP1
       CODE:0d33 57 19           BTFSC      DAT_DATA_0157,#0x2                                = ??
       CODE:0d34 40 2d           GOTO       offset LAB_CODE_0d40
       CODE:0d35 d7 1d           BTFSS      DAT_DATA_0157,#0x3                                = ??
       CODE:0d36 4c 2d           GOTO       offset LAB_CODE_0d4c
       CODE:0d37 d7 11           BCF        DAT_DATA_0157,#0x3                                = ??
       CODE:0d38 83 12           BCF        STATUS,RP0
       CODE:0d39 03 13           BCF        STATUS,RP1
       CODE:0d3a 13 08           MOVF       SSPBUF,w                                         = ??
       CODE:0d3b 83 12           BCF        STATUS,RP0
       CODE:0d3c 03 17           BSF        STATUS,RP1
       CODE:0d3d db 00           MOVWF      i2c_rx_buffer[3]                                 = null
       CODE:0d3e d6 07           ADDWF      i2c_rx_checksum,f                                = ??
       CODE:0d3f 4c 2d           GOTO       offset LAB_CODE_0d4c
                             -------------------------------------------------------------
             assume PCLATH = 0xd
                             LAB_CODE_0d40                                   XREF[1]:     CODE:0d34(j)  
       CODE:0d40 d7 1c           BTFSS      DAT_DATA_0157,#0x1                               = ??

Actual output

             assume PCLATH = 0xd
                             LAB_CODE_0d31                                   XREF[1]:     CODE:0d2d(j)  
       CODE:0d31 83 12           BCF        STATUS,RP0
             assume PCLATH = <UNKNOWN>
       CODE:0d32 03 17           BSF        STATUS,RP1
       CODE:0d33 57 19           BTFSC      DAT_DATA_0157,#0x2                                = ??
       CODE:0d34 40 2d           GOTO       offset LAB_CODE_0d40
       CODE:0d35 d7 1d           BTFSS      DAT_DATA_0157,#0x3                                = ??
       CODE:0d36 4c 2d           GOTO       offset LAB_CODE_0d4c
       CODE:0d37 d7 11           BCF        DAT_DATA_0157,#0x3                                = ??
       CODE:0d38 83 12           BCF        STATUS,RP0
       CODE:0d39 03 13           BCF        STATUS,RP1
       CODE:0d3a 13 08           MOVF       SSPBUF,w                                         = ??
       CODE:0d3b 83 12           BCF        STATUS,RP0
       CODE:0d3c 03 17           BSF        STATUS,RP1
       CODE:0d3d db 00           MOVWF      i2c_rx_buffer[3]                                 = null
       CODE:0d3e d6 07           ADDWF      i2c_rx_checksum,f                                = ??
       CODE:0d3f 4c 2d           GOTO       offset LAB_CODE_0d4c
                             -------------------------------------------------------------
             assume PCLATH = 0xd
                             LAB_CODE_0d40                                   XREF[1]:     CODE:0d34(j)  
       CODE:0d40 d7 1c           BTFSS      DAT_DATA_0057,#0x1                               = ??

Attachments pic16_instructions.diff.txt PSU PIC16F886.txt

Environment:

emteere commented 3 years ago

In general the context bits shouldn't be used to track state from one instruction to the next as the context and state of the register/bit gets locked into the following instruction. Normally the context is used for things like thumb-mode that directly affect the disassembly. You can't always depend on the disassembly flowing status state correctly. Registers can also be set to an initial state at the start of a function.

The main issue is setting the bits in the STATUS register when it's state is unknown isn't currently handled very well. We could potentially add a more careful tracking for certain registers, as is done in the decompiler, for certain registers like STATUS. This is probably a better solution in the long-run.

A simpler solution is to break out the RP0/1 bits into a separate fake register that can be tracked without knowing the state of the other status bits. This is essentially what you were doing anyway, only in registers not disassembly context. The C bit will also have this issue, but may not matter. Each language does it differently. Some pack the status bits in, others don't. If all the instructions that load/store the status register and bits can be known, then this is easy to do. Whenever something grabs the status register the fake register bits are packed in. Some for setting the status register. There is actually a macro to do this already in the PIC16, but it is unused. The X86 is done with separate flag registers, and the PIC16 looks like it may have done it this way at some point in the past.

peterbelm commented 3 years ago

A simpler solution is to break out the RP0/1 bits into a separate fake register that can be tracked without knowing the state of the other status bits.

This makes things a bit clearer, as it did seem strange that the RP bits were split out into a register.

My attempts at getting it working with the processor context were mostly experimental - but it made such a significant difference I thought I should include it.

The issue remains why the original RP register does not appear to be working, I'm going to do some more investigation stepping through the PIC16Analyzer.

emteere commented 3 years ago

I discovered there is a bug in the constant propagation looking up registers. I'm working on a fix to it. I hadn't realized the RP register was set separately as well as the STATUS register when I debugged the issue before. I had jumped to the conclusion incorrectly that the load of the RP for the memory load came from the RP0/1 bits in the STATUS memory. It may be that there is a workaround in the meantime. If I think of one, I'll let you know.

peterbelm commented 3 years ago

Thank you.

I got a bit stuck on a decompiler issue on FUN_CODE_092d in my example file, as it seems to be going off into invalid memory ranges (CODE:7xxx etc).

I found there was potential in some of the instructions to get an incorrect PC address as there weren't explicit bit masks in place. The attached patch adds these bit masks and at least solves the invalid memory range errors.

I'm still investigating the remaining decompiler errors: Unable to resolve constructor at CODE:1ad2. I'll leave an update as to what I find. pic16_pc_explicit_bitmasks.patch.txt

emteere commented 3 years ago

Turns out, it isn't exactly a bug. There is an assumption that any Memory Mapped register will be accessed directly by the instruction, and won't need a reference to it. If it were, every access to STATUS, PCLATH, PCL, INTCON, etc... would get a reference to that memory location.

The memory mapped registers in this case are defined by the lines:

define DATA offset=0x0000 size=1 [
    INDF TMR0 PCL STATUS FSR PORTA PORTB PORTC PORTD PORTE PCLATH INTCON PIR1 PIR2 TMR1L TMR1H
];

To fix the issue, I removed certain memory mapped registers, and moved them to the .pspec file. This should fix most if not all of your accesses. There were special scrReg lines for special registers that ignore the RP bank register.

I also initialized the RP register in the .pspec file, which will assume a value of zero at the start of a function.

diff --git a/Ghidra/Processors/PIC/data/languages/pic16.pspec b/Ghidra/Processors/PIC/data/languages/pic16.pspec
index 4133f95..ea6677c 100644
--- a/Ghidra/Processors/PIC/data/languages/pic16.pspec
+++ b/Ghidra/Processors/PIC/data/languages/pic16.pspec
@@ -10,6 +10,9 @@
    <tracked_set space="CODE">
        <set name="SkipNext" val="0"/>
    </tracked_set>
+   <tracked_set space="CODE">
+       <set name="RP" val="0"/>
+   </tracked_set>
    <tracked_set space="CODE" first="0x0" last="0x1ff">
        <set name="PCLATH" val="0"/>
    </tracked_set>
@@ -40,6 +43,11 @@
   <default_symbols>
     <symbol name="Reset" address="CODE:0000" entry="true"/>
     <symbol name="Interrupt" address="CODE:0004" entry="true"/>
+    <symbol name="PORTA" address="DATA:05" entry="false"/>
+    <symbol name="PORTB" address="DATA:06" entry="false"/>
+    <symbol name="PORTC" address="DATA:07" entry="false"/>
+    <symbol name="PORTD" address="DATA:08" entry="false"/>
+    <symbol name="PORTE" address="DATA:09" entry="false"/>
     <symbol name="PIR1" address="DATA:0C" entry="false"/>
     <symbol name="PIR2" address="DATA:0D" entry="false"/>
     <symbol name="TMR1L" address="DATA:0E" entry="false"/>
diff --git a/Ghidra/Processors/PIC/data/languages/pic16.sinc b/Ghidra/Processors/PIC/data/languages/pic16.sinc
index d7144da..e24a6a3 100644
--- a/Ghidra/Processors/PIC/data/languages/pic16.sinc
+++ b/Ghidra/Processors/PIC/data/languages/pic16.sinc
@@ -80,7 +80,8 @@
 #
 @if PROCESSOR == "PIC_16"
 define DATA offset=0x0000 size=1 [
-   INDF    TMR0    PCL     STATUS  FSR     PORTA   PORTB   PORTC   PORTD   PORTE   PCLATH  INTCON  PIR1    PIR2    TMR1L   TMR1H
+#  INDF    TMR0    PCL     STATUS  FSR     PORTA   PORTB   PORTC   PORTD   PORTE   PCLATH  INTCON  PIR1    PIR2    TMR1L   TMR1H
+   INDF        _       PCL     STATUS  FSR     _               _             _              _              _              PCLATH  INTCON _     _       _       _
 ];

 @elif PROCESSOR == "PIC_16F"
diff --git a/Ghidra/Processors/PIC/data/languages/pic16_instructions.sinc b/Ghidra/Processors/PIC/data/languages/pic16_instructions.sinc
index 5dda8d7..72db4a5 100644
--- a/Ghidra/Processors/PIC/data/languages/pic16_instructions.sinc
+++ b/Ghidra/Processors/PIC/data/languages/pic16_instructions.sinc
@@ -44,7 +44,8 @@

 @if PROCESSOR == "PIC_16"
 attach variables [ fregCore ] [
-   INDF TMR0 PCL STATUS FSR PORTA PORTB PORTC PORTD PORTE PCLATH INTCON _ _ _ _
+#  INDF TMR0 PCL STATUS FSR PORTA PORTB PORTC PORTD PORTE PCLATH INTCON _ _ _ _
+   _  _ PCL STATUS FSR _ _ _ _ _ PCLATH INTCON _ _ _ _
 ];

 @elif PROCESSOR == "PIC_16F"
@@ -178,12 +179,12 @@

 # File register index (f7=0): INDF use implies indirect data access using FSR value and IRP bit in STATUS reg
 @if PROCESSOR == "PIC_16"
-srcREG: fregCore       is f7=0 & fregCore                                  { 
+srcREG: INDF       is f7=0 & INDF                                  { 
    addr:2 = (zext(IRP) << 8) + zext(FSR);
    export *[DATA]:1 addr; 
 }
-srcREG: fregCore       is f7=1 & fregCore                              { 
-     rpval:2 = zext(RP == 1) + zext(RP == 2);
+srcREG: lf7        is f7=1 & lf7                               { 
+     rpval:2 = zext(RP == 1) + zext(RP == 3);
     addr:2 = (zext(rpval) << 7) + 1;
     export *[DATA]:1 addr;
 }
emteere commented 3 years ago

I recently saw the mask issue with another processor. I hadn't investigated why the decompiler is sign extending the value. I suspect it may be a bug, and this is a workaround for it.

EuanKirkhope commented 3 years ago

Does this affect the PIC18 also?