NationalSecurityAgency / ghidra

Ghidra is a software reverse engineering (SRE) framework
https://www.nsa.gov/ghidra
Apache License 2.0
51.62k stars 5.87k forks source link

Support DWARF location expressions that use BP #2322

Open Ruturaj4 opened 4 years ago

Ruturaj4 commented 4 years ago

I am using debin project to recover symbols in stripped binaries. This project leverages machine learning approach to reverse engineer variables, types and variable names from stripped binaries. It also rebuilds the stripped section (.debug) so that it will be easier for the reverse engineering frameworks to leverage this information to improve the analysis.

But, I observed that even debin successfully builds some of the symbols, ghidra ignores these symbols during analysis. Is there a particular reason for tha? and is there any way to force ghidra to use such symbols (in gui as well as in cli).

for e.g. I have following code (ref: sard 89 benchmark - 000/000/151):

 50 int main(int argc, char *argv[])
 51 {
 52   int init_value;
 53   int inc_value;
 54   int loop_counter;
 55   char buf[10];
 56
 57   init_value = 0;
 58   inc_value = 4105 - (4105 - 1);
 59
 60   loop_counter = init_value;
 61   while((loop_counter += inc_value) && (loop_counter <= 4105))
 62   {
 63     /*  BAD  */
 64     buf[loop_counter] = 'A';
 65   }
 66
 67
 68   return 0;
 69 }

readelf -wi output on debin binary (symbols are generated by debin):

<1><2d>: Abbrev Number: 2 (DW_TAG_subprogram)
    <2e>   DW_AT_name        : main
    <33>   DW_AT_type        : <0x1d>
    <37>   DW_AT_low_pc      : 0x401106
    <3f>   DW_AT_high_pc     : 0x47
 <2><47>: Abbrev Number: 4 (DW_TAG_variable)
    <48>   DW_AT_name        : matchError
    <53>   DW_AT_location    : 2 byte block: 76 5c      (DW_OP_breg6 (rbp): -36)
    <56>   DW_AT_type        : <0x24>
 <2><5a>: Abbrev Number: 4 (DW_TAG_variable)
    <5b>   DW_AT_name        : name
    <60>   DW_AT_location    : 2 byte block: 76 50      (DW_OP_breg6 (rbp): -48)
    <63>   DW_AT_type        : <0x2b>
 <2><67>: Abbrev Number: 4 (DW_TAG_variable)
    <68>   DW_AT_name        : group
    <6e>   DW_AT_location    : 2 byte block: 76 78      (DW_OP_breg6 (rbp): -8)
    <71>   DW_AT_type        : <0x1d>
 <2><75>: Abbrev Number: 4 (DW_TAG_variable)
    <76>   DW_AT_name        : flushType
    <80>   DW_AT_location    : 2 byte block: 76 74      (DW_OP_breg6 (rbp): -12)
    <83>   DW_AT_type        : <0x1d>
 <2><87>: Abbrev Number: 4 (DW_TAG_variable)
    <88>   DW_AT_name        : i
    <8a>   DW_AT_location    : 2 byte block: 76 7c      (DW_OP_breg6 (rbp): -4)
    <8d>   DW_AT_type        : <0x1d>

Ghidra GUI:

image

Thanks in advanced.

Debin paper ref: https://dl.acm.org/doi/pdf/10.1145/3360572

dev747368 commented 4 years ago

Was the DWARF analyzer enabled during analysis, and were there any messages in the log about issues with DWARF?

Ruturaj4 commented 4 years ago

Thanks for your comment. I can't see any errors in gui. The only error I can see is this (in cli).

INFO  Read DWARF debug string table, 0 bytes. (DWARFProgram)
INFO  DWARF import - total elapsed: 25ms (DWARFImportSummary)
INFO  DWARF data type import - elapsed: 9ms (DWARFImportSummary)
INFO  DWARF func & symbol import - elapsed: 16ms (DWARFImportSummary)
INFO  DWARF types imported: 2 (DWARFImportSummary)
INFO  DWARF function signatures added: 1 (DWARFImportSummary)
dev747368 commented 4 years ago

DWARF local variable info is problematic for Ghidra and its kind of hit-and-miss if we can use it.

However, it does appear you are getting some info, ie. data types and function signatures.

One of the options of the DWARF analyzer is to mark up the imported items with the DWARF DIE record number. If you turn that on, you should be able have more of an indication of which items were successfully pulled into Ghidra.

Ruturaj4 commented 4 years ago

Thanks for you replay.

No, I don't think I am getting any information from dwarf. I checked the output using stripped binary and ghidra gives me exact same output. I kept .symtab section before and stripped everything else, thus you may see function signature information.

image

Could please tell me what is that option? I turned everything on, but no effect on the binary

dev747368 commented 4 years ago

"Output DWARF DIE info". You should see the DIE info tagged on the data type's comment field and a pre-comment on functions if the DWARF analyzer created that entry.

Could you post the binary, or at least the entire contents of the readelf output?

Ruturaj4 commented 4 years ago

Thanks so much. But it doesn't show up in my case. I checked IDA pro and it detects the information correctly.

Please check the attached binaries. This binaries are for the different program though. Benchmark - sard88 - 283 test.

283.zip

I attached three binaries (compressed form). Note that these binaries contain buffer overflow.

obo_bad.o - binary compiled with -g flag (gcc -g)

obo_bad_debin.o - binary is stripped (keeping .symtab section - i.e. -g flag strip -g ./bin) and then .debug section is recovered using debin

obo_bad_stripped - stripped off all the information (strip ./bin)

If you compare obo_bad_stripped and obo_bad_debin, you can't see much difference in the output. You can observe that the variable names are not being detected correctly.

dev747368 commented 4 years ago

Thanks for the quick turnaround.

So, I am getting DIE info tagged on functions for your obo_bad_debin, and some data types.

Like you, I am not getting local variables, and it comes down to the way the location of the local variable was encoded in the dwarf location expression attached to each variable definition.

Background info for those that don't know, DWARF defines a small embedded stack-based expression language. For each thing that has a location, the DWARF spec allows that location to be defined using instructions in that expression language. Ghidra can evaluate only a sub-set of that expression language because some of the operations are using live values from CPU registers.

In some cases we can map those register-referencing operations to Ghidra native definitions (ie. if the register was the stack register, and it was a simple offset from the register), and in some cases we can't.

In your case, debin is preferring to use operations that are relative to rbp:

  <2><df>: Abbrev Number: 4 (DW_TAG_variable)
     <e0>   DW_AT_name        : s
     <e2>   DW_AT_location    : 2 byte block: 76 68      (DW_OP_breg6 (rbp): -24)
     <e5>   DW_AT_type        : <0x2e>

Which is pretty close to being the stack register, but we're not currently handling it.

Here is a normal gcc generated local variable, which using fbreg:

  <2><6a>: Abbrev Number: 4 (DW_TAG_variable)
     <6b>   DW_AT_name        : (indirect string, offset: 0x81): init_value
     <6f>   DW_AT_decl_file   : 1
     <70>   DW_AT_decl_line   : 3
     <71>   DW_AT_type        : <0xa3>
     <75>   DW_AT_location    : 2 byte block: 91 68      (DW_OP_fbreg: -24)
dev747368 commented 4 years ago

I'm going to rename this issue to "Support DWARF location expressions using BP" and mark it as an enhancement. Hopefully you are ok with that.

Ruturaj4 commented 4 years ago

Great! Thanks so much for quickly addressing this. I kept this issue open for now, you may close it as per your procedure.