Closed ZhangZhuoSJTU closed 3 years ago
According to #9, I tried -no-pie
. The issue looks solved.
Reopen due to the error raised by clang. I try to use clang to compile.
$ clang -no-pie a.s -lz -lm -lstdc++ -o a.out.new
a.s:7353:28: error: cannot use more than one symbol in memory operand
mov RDX,OFFSET .L_401640
^
a.s:7359:28: error: cannot use more than one symbol in memory operand
mov RDX,OFFSET .L_401690
^
a.s:7719:28: error: cannot use more than one symbol in memory operand
mov RBP,OFFSET .L_401640
^
a.s:7724:28: error: cannot use more than one symbol in memory operand
mov RAX,OFFSET .L_401690
^
a.s:32927:13: error: invalid operand for instruction
call 4197712
^
a.s:34172:19: error: invalid escape sequence (unrecognized character) in '.string' directive
.string "Too many IDAT\'s found"
^
a.s:34234:19: error: invalid escape sequence (unrecognized character) in '.string' directive
.string "It\'s an error to set both read_data_fn and write_data_fn in the "
^
a.s:34240:19: error: invalid escape sequence (unrecognized character) in '.string' directive
.string "Can\'t discard critical data on CRC error."
These errors seem caused by the assembly syntax difference, except the following one
a.s:32927:13: error: invalid operand for instruction
call 4197712
Then, I go checking the a.s
file, and find the following piece of assembly code.
lea R12,QWORD PTR [RIP+.L_618dd0]
push RBP
.cfi_def_cfa_offset 48
.cfi_offset 6, -48
lea RBP,QWORD PTR [RIP+.L_618dd8]
push RBX
.cfi_def_cfa_offset 56
.cfi_offset 3, -56
mov R13D,EDI
mov R14,RSI
sub RBP,R12
sub RSP,8
.cfi_def_cfa_offset 64
sar RBP,3
call 4197712
test RBP,RBP
je .L_4129c6
xor EBX,EBX
It looks call 4197712
is invalid because the failure of symbolization.
Btw, it would be very appreciated if you can help me emit other clang errors. I am trying to write some tools using llvm passes.
Thanks.
Ok, that looks interesting! We haven't really done reassembly with clang so far. One thing that you can try is using ATT syntax as follows:
ddisasm a.out --ir a.gtirb
gtirb-pprinter a.gtirb --syntax att --asm a.s
The generated asm file produces fewer errors with clang. I got the 3 errors with scape characters in strings:
a.s:34160:19: error: invalid escape sequence (unrecognized character) in '.string' directive
.string "Too many IDAT\'s found"
^
a.s:34222:19: error: invalid escape sequence (unrecognized character) in '.string' directive
.string "It\'s an error to set both read_data_fn and write_data_fn in the "
^
a.s:34228:19: error: invalid escape sequence (unrecognized character) in '.string' directive
.string "Can\'t discard critical data on CRC error."
This is something that should be solved in https://github.com/GrammaTech/gtirb-pprinter but in the meantime you can just remove the \
from those strings manually.
Once that is done, the binary compiles:
clang++ a.s -o a_rewritten -lm -lz
Though that call that you identified looks like there is still a problem. I also got some complaints when reassembling regarding the eh_frame
. I will look into these issues further.
Thanks for reporting!
Ok, the call that you identified is not really a symbolization error. It is a call to the .init
section.
The standard approach in gtirb-pprinter is to skip printing the .init
and .fini
sections (and others) and let the compiler add them again (together with the initialization code that calls them). That means that the portion of code that you identified becomes dead code in the rewritten binary. In other words, that call instruction should not be a problem.
The call appear not symbolized just to avoid references to undefined labels (since we are skiping the section where the label would be located). You can check this by printing assembly code with debug information (which does not skip any sections):
ddisasm a.out --asm a.s --debug
This will generate code that is not reassembleable though. You can also specify which sections to skip or not skip in gtirb-pprinter directly:
gtirb-pprinter a.gtirb --asm a.s --keep-section .init
Thanks!
I got a similar issue about eh_frame
. The gcc complained
/usr/bin/ld: error in /tmp/test-1-74729f.o(.eh_frame); no .eh_frame_hdr table will be created.
May I know whether it is a common issue or not? If the test binary is needed, I can attach it. But it is an obfuscated c++ program anyway.
This error probably means something is wrong with the eh_frame
information. Maybe there is a bug in the CFI directives creation.
Or, maybe it is caused by C++ exception? I think the stack unwinding information is also stored in eh_frame
. I have tried --no-cfi-directives
option but it also failed.
Hi, thanks for this great work.
I am trying to use ddisasm to reassembly some my CTF programs. The file a.out is attached.
It is 64-bit position dependent code.
The ddisasm version is shown as follows.
When I try to disassemble it, it works well.
But when I try to reassemble it, it failed.
I know it is possible that ddisasm makes some wrong symbolization. But may I know whether the failure is caused by the wrong symbolization or because I did something wrong?
Btw, the ddisasm works well on the example code.
Thanks.