NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
50.64k stars 5.79k forks source link

C parser fails to evaluate certain expressions correctly #3410

Closed snemes closed 2 years ago

snemes commented 3 years ago

Describe the bug The Ghidra C preprocessor/parser fails to evaluate certain preprocessor expressions correctly. Originally thought to be related to #1421, but turned out to be a separate issue.

To Reproduce Steps to reproduce the behavior:

  1. Open an arbitrary file in the Code Browser
  2. If the file has any Data Type Archives opened, close them by right clicking them in the Data Type Manager window and by selecting "Close Archive"
  3. Click on File > Parse C Source... to open up the "Parse C Source" dialog
  4. Click on the 2nd icon (Save profile to new name), and enter an arbitrary name (e.g. "Testing")
  5. Click on the 3rd icon (Clear profile)
  6. Click on the "+" icon, and select the "test.h" file (attached to this issue)
  7. Click on the 1st icon (Save profile)
  8. Add the "-v6" option to the "Parse Options" text field to have some debug messages generated
  9. Click on the "Parse to Program" button, then confirm with the "Continue" button
  10. You will see an error message.

Expected behavior Ghidra should be able to parse this test file without problems. Most of lines in the test file were borrowed from the standard Windows header file "winapifamily.h", with some further lines added to be able to localize and trigger he problem. The error message triggers because the struct at the end of the .h file references the WORD data type (Ghidra already has a built-in lowercase "word" data type, but that was intentionally not used here), which is defined right above the struct definition inside an #if conditional preprocessor block.

Screenshots N/A

Attachments test.h - test.zip

#define WINAPI_PARTITION_DESKTOP 0x1
#define WINAPI_PARTITION_APP     0x2

#define WINAPI_FAMILY_APP WINAPI_PARTITION_APP
#define WINAPI_FAMILY_DESKTOP_APP (WINAPI_PARTITION_DESKTOP | WINAPI_PARTITION_APP)

/* WINAPI_FAMILY can be either desktop + App, or App.  */
#ifndef WINAPI_FAMILY
#define WINAPI_FAMILY WINAPI_FAMILY_DESKTOP_APP
#endif

#define WINAPI_FAMILY_PARTITION(v) ((WINAPI_FAMILY & v) == v)
#define WINAPI_FAMILY_ONE_PARTITION(vset, v) ((WINAPI_FAMILY & vset) == v)

#if WINAPI_FAMILY_PARTITION (WINAPI_PARTITION_APP)
    typedef unsigned short WORD;
#endif

typedef struct {
    WORD x;
    WORD y;
} POINT;

CParserPlugin.out - CParserPlugin.zip

#line 1: "test.h"
Line 1: test.h Check isDef WINAPI_PARTITION_DESKTOP False
 test.h'1: Defining text: WINAPI_PARTITION_DESKTOP = 0x1
Line 2: test.h Check isDef WINAPI_PARTITION_APP False
 test.h'2: Defining text: WINAPI_PARTITION_APP =     0x2

Line 4: test.h Check isDef WINAPI_FAMILY_APP False
 test.h'4: Defining text: WINAPI_FAMILY_APP = WINAPI_PARTITION_APP
Line 5: test.h Check isDef WINAPI_FAMILY_DESKTOP_APP False
 test.h'5: Defining text: WINAPI_FAMILY_DESKTOP_APP = (WINAPI_PARTITION_DESKTOP | WINAPI_PARTITION_APP)

test.h'8 IFNDEF: Line 8: test.h Check isNDef ifndef True
Line 8: test.h Check isNDef WINAPI_FAMILY True
 [1]IfNDef true : true

Line 9: test.h Check isDef WINAPI_FAMILY False
 test.h'9: Defining text: WINAPI_FAMILY = WINAPI_FAMILY_DESKTOP_APP
 [0]test.h'10: Endif  : true

Line 12: test.h Check isDef WINAPI_FAMILY_PARTITION False
 test.h'12: Defining text: WINAPI_FAMILY_PARTITION =  ((WINAPI_FAMILY & v) == v)
 test.h'12: Defining text: WINAPI_FAMILY_PARTITION =  ((WINAPI_FAMILY & v) == v)
    test.h'12: Defining text: WINAPI_FAMILY_PARTITION (
vtest.h'12: Defining text: WINAPI_FAMILY_PARTITION (vLine 13: test.h Check isDef WINAPI_FAMILY_ONE_PARTITION False
 test.h'13: Defining text: WINAPI_FAMILY_ONE_PARTITION =  ((WINAPI_FAMILY & vset) == v)
 test.h'13: Defining text: WINAPI_FAMILY_ONE_PARTITION =  ((WINAPI_FAMILY & vset) == v)
    test.h'13: Defining text: WINAPI_FAMILY_ONE_PARTITION (
vset ,v)
test.h'13: Defining text: WINAPI_FAMILY_ONE_PARTITION (vset ,v)

test.h'15 IF: Line 15: test.h Check isDef (((0x1 | 0x2) & 0x2) == 0x2) False
 [1]If false : false
 [0]test.h'17: Endif  : true

typedef struct {
    WORD x;
    WORD y;
} POINT;

Environment (please complete the following information):

Additional context The attached CParserPlugin.out file shows the problematic part here:

test.h'15 IF: Line 15: test.h Check isDef (((0x1 | 0x2) & 0x2) == 0x2) False
 [1]If false : false
 [0]test.h'17: Endif  : true

So the expression (1 | 2) & 2 == 2 incorrectly evaluates to False according to the C preprocessor, so the rest of the #if block is skipped, thus the WORD data type is not getting defined, leading to the error later in the struct definition.

emteere commented 3 years ago

Thanks for the well crafted example.

The main issue is the expansion of the macro with arguments in the #if. The current JavaCC parser can't parse additional injected text without instantiating a new parser. Calling a new parser for every expansion of a macro with arguments could be very slow.

I've worked out a fix that is a compromise. The parser will attempt to simplify expressions after macro expansion and concatenation. I suspect there will still be some header files that won't parse correctly. I also suspect the fix will address some other long standing cparser issues. The fix will need to go through our review process as there was some refactoring required to a hand rolled expression parser. We'll try and push it to github soon.