GrammaTech / ddisasm

A fast and accurate disassembler
https://grammatech.github.io/ddisasm/
GNU Affero General Public License v3.0
647 stars 59 forks source link

[BINARY] fails reassembly #14

Closed ZhangZhuoSJTU closed 3 years ago

ZhangZhuoSJTU commented 3 years ago

Hi, thanks for this great work.

I am trying to use ddisasm to reassembly some my CTF programs. The file a.out is attached.

It is 64-bit position dependent code.

$ file a.out
a.out: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/l, for GNU/Linux 3.2.0, BuildID[sha1]=a7ca5dae0321cb388b9e35b7e9237468ec95458c, stripped

$ readelf -h a.out
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2\'s complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401080
  Start of program headers:          64 (bytes into file)
  Start of section headers:          103136 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         28
  Section header string table index: 27

$ md5sum a.out
c82373549ce71428dff66b22976c3e3a  a.out

The ddisasm version is shown as follows.

$ ddisasm --version
1.0.1 (453889a 2020-07-14)

When I try to disassemble it, it works well.

$ ddisasm a.out --asm a.s
Building the initial gtirb representation  (1ms)
Decoding the binary  (210ms)
Disassembling (271s)
Populating gtirb representation  (277ms)
Computing intra-procedural SCCs  (7ms)
Computing no return analysis  (5s)
Detecting additional functions  (7s)
Printing assembler  (73ms)

But when I try to reassemble it, it failed.

$ gcc a.s -o a.out.new
/usr/bin/ld: /tmp/ccDUmrVS.o: relocation R_X86_64_32S against \`.text\' can not be used when making a PIE object; recompile with -fPIC
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

I know it is possible that ddisasm makes some wrong symbolization. But may I know whether the failure is caused by the wrong symbolization or because I did something wrong?

Btw, the ddisasm works well on the example code.

$ readelf -h ex
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2\'s complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400400
  Start of program headers:          64 (bytes into file)
  Start of section headers:          6400 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         9
  Size of section headers:           64 (bytes)
  Number of section headers:         29
  Section header string table index: 28

$ ddisasm ex --asm ex.s
Building the initial gtirb representation  (0ms)
Decoding the binary  (3ms)
Disassembling (32s)
Populating gtirb representation  (2ms)
Computing intra-procedural SCCs  (0ms)
Computing no return analysis  (631ms)
Detecting additional functions  (1s)
Printing assembler  (0ms)

$ gcc ex.s -o ex.out

$ ./ex.out
!!!Hello World!!!

Thanks.

ZhangZhuoSJTU commented 3 years ago

According to #9, I tried -no-pie. The issue looks solved.

ZhangZhuoSJTU commented 3 years ago

Reopen due to the error raised by clang. I try to use clang to compile.

$ clang -no-pie a.s -lz -lm -lstdc++ -o a.out.new
a.s:7353:28: error: cannot use more than one symbol in memory operand
            mov RDX,OFFSET .L_401640
                           ^
a.s:7359:28: error: cannot use more than one symbol in memory operand
            mov RDX,OFFSET .L_401690
                           ^
a.s:7719:28: error: cannot use more than one symbol in memory operand
            mov RBP,OFFSET .L_401640
                           ^
a.s:7724:28: error: cannot use more than one symbol in memory operand
            mov RAX,OFFSET .L_401690
                           ^
a.s:32927:13: error: invalid operand for instruction
            call 4197712
            ^
a.s:34172:19: error: invalid escape sequence (unrecognized character) in '.string' directive
          .string "Too many IDAT\'s found"
                  ^
a.s:34234:19: error: invalid escape sequence (unrecognized character) in '.string' directive
          .string "It\'s an error to set both read_data_fn and write_data_fn in the "
                  ^
a.s:34240:19: error: invalid escape sequence (unrecognized character) in '.string' directive
          .string "Can\'t discard critical data on CRC error."

These errors seem caused by the assembly syntax difference, except the following one

a.s:32927:13: error: invalid operand for instruction
            call 4197712

Then, I go checking the a.s file, and find the following piece of assembly code.

            lea R12,QWORD PTR [RIP+.L_618dd0]
            push RBP
.cfi_def_cfa_offset 48
.cfi_offset 6, -48
            lea RBP,QWORD PTR [RIP+.L_618dd8]
            push RBX
.cfi_def_cfa_offset 56
.cfi_offset 3, -56
            mov R13D,EDI
            mov R14,RSI
            sub RBP,R12
            sub RSP,8
.cfi_def_cfa_offset 64
            sar RBP,3
            call 4197712

            test RBP,RBP
            je .L_4129c6

            xor EBX,EBX

It looks call 4197712 is invalid because the failure of symbolization.

Btw, it would be very appreciated if you can help me emit other clang errors. I am trying to write some tools using llvm passes.

Thanks.

aeflores commented 3 years ago

Ok, that looks interesting! We haven't really done reassembly with clang so far. One thing that you can try is using ATT syntax as follows:

ddisasm a.out --ir a.gtirb
gtirb-pprinter a.gtirb --syntax att --asm a.s

The generated asm file produces fewer errors with clang. I got the 3 errors with scape characters in strings:

a.s:34160:19: error: invalid escape sequence (unrecognized character) in '.string' directive
          .string "Too many IDAT\'s found"
                  ^
a.s:34222:19: error: invalid escape sequence (unrecognized character) in '.string' directive
          .string "It\'s an error to set both read_data_fn and write_data_fn in the "
                  ^
a.s:34228:19: error: invalid escape sequence (unrecognized character) in '.string' directive
          .string "Can\'t discard critical data on CRC error."

This is something that should be solved in https://github.com/GrammaTech/gtirb-pprinter but in the meantime you can just remove the \ from those strings manually. Once that is done, the binary compiles:

clang++ a.s -o a_rewritten -lm -lz

Though that call that you identified looks like there is still a problem. I also got some complaints when reassembling regarding the eh_frame. I will look into these issues further. Thanks for reporting!

aeflores commented 3 years ago

Ok, the call that you identified is not really a symbolization error. It is a call to the .init section.

The standard approach in gtirb-pprinter is to skip printing the .init and .fini sections (and others) and let the compiler add them again (together with the initialization code that calls them). That means that the portion of code that you identified becomes dead code in the rewritten binary. In other words, that call instruction should not be a problem.

The call appear not symbolized just to avoid references to undefined labels (since we are skiping the section where the label would be located). You can check this by printing assembly code with debug information (which does not skip any sections):

ddisasm a.out --asm a.s --debug

This will generate code that is not reassembleable though. You can also specify which sections to skip or not skip in gtirb-pprinter directly:

gtirb-pprinter a.gtirb --asm a.s --keep-section .init
ZhangZhuoSJTU commented 3 years ago

Thanks!

mmmdzz commented 3 years ago

I got a similar issue about eh_frame. The gcc complained

/usr/bin/ld: error in /tmp/test-1-74729f.o(.eh_frame); no .eh_frame_hdr table will be created.

May I know whether it is a common issue or not? If the test binary is needed, I can attach it. But it is an obfuscated c++ program anyway.

aeflores commented 3 years ago

This error probably means something is wrong with the eh_frame information. Maybe there is a bug in the CFI directives creation.

mmmdzz commented 3 years ago

Or, maybe it is caused by C++ exception? I think the stack unwinding information is also stored in eh_frame. I have tried --no-cfi-directives option but it also failed.