GrammaTech / ddisasm

A fast and accurate disassembler
https://grammatech.github.io/ddisasm/
GNU Affero General Public License v3.0
645 stars 60 forks source link

Several reassembly error cases on Ddisasm #54

Open witbring opened 1 year ago

witbring commented 1 year ago

When I tested Ddisasm v1.5.3 (docker image digests a803c9, Apr. 2022) for my research, I found several interesting bug cases.    First, I observe that Ddisasm incorrectly symbolize jump table. As an example, given the jump table entry ‘.long .L4895-.L4896’ found in addr2line.tar.gz of Binutils, Ddisasm recognized the value as a jump table entry but the label value is misidentified.

Second, I found that Ddisasm omits some definitions of labels. For example, given the instruction ‘movl $default_quoting_options, %eax’ found in true.tar.gz (x64 non-pie binary) of Coreutils, Ddisasm reassembled the instruction as ‘mov EAX,OFFSET .L_40b2e0’. However, Ddisasm missed the definition of label ‘. L_40b2e0’ so it causes a compilation error.

Third, I observed that Ddisasm makes wrong symbolic expressions so some recompiled binaries refer to incorrect addresses. As an example, given the disassembly code ‘.long .L1543@GOTOFF’ found in nm_new.tar.gz (x86 pie binary) of binutils, Ddisasm symbolized the pointer as ‘.long .L_e4b5-.L_785f1’

Also, I observed that Ddisasm makes some mistakes when it generates got-relative labels. As an example, given the instruction ‘addl $yydefgoto@GOTOFF, %eax’ found in date.tar.gz (x86 pie binary) of coreutils, ddiasm the immediate value as ‘.L_11eca@GOTOFF’. However, the ‘yydefgoto’ is placed at 0x11ee6 not 0x11eca. Also, I calculated the got relative address and concluded that Ddisasm misidentified the label value.

$ objdump -d -M intel date | grep 6395 6395: 81 c0 e6 be ff ff add eax,0xffffbee6

$ readelf -S date | grep got.plt [24] .got.plt PROGBITS 00016000 015000 000128 04 WA 0 0 4

$ python3 -c 'print(hex(0xffffbee6 + 0x0016000 & 0xffffffff))' 0x11ee6

- Reassembler-generated assembly
```asm
          6395:   add EAX,OFFSET .L_11eca@GOTOFF

Lastly, I observed that Ddisasm fails on symbolization when it handles large size binary. For example, Ddisasm fails on symbolizing rip-relative addressing when it reassembled 416.gamess.tar.gz (delete link) of spec cpu 2006. As a result, it causes tremendous false negative errors. 

aeflores commented 1 year ago

Hi @witbring. Thanks for the report!

issue 54.1: addr2line

The first issue, jump table in addr2line seems to be solved in the current master fa15bff2748fc5c2daccd7380fe852e2c2c5d90f

.L_131830:
# data_access(4, 4, 10360e), preferred_data_access(4, 131830)
          131830: .long .L_103630-.L_131830

I would suggest trying the latest version.

issue 54.2: true

This seems to cause a different problem in the current ddisasm version, I will investigate further.

issue 54.3: nm_new

There seems to be something wrong with the tar file that you uploaded, can you upload it again?

issue 54.4: date

I see what you are saying. The disassembled instruction in the broader context is:

.L_6390:
          6390:   add EAX,-28
          6393:   add EAX,EBX
          6395:   add EAX,OFFSET .L_11eca@GOTOFF

That -28 in 6390 is precisely the difference between 11eca and 11ee6. So at 6395, ddisasm considers that EAX has the got address minus 28. If this is wrong, it is because -28 is an offset into the yydefgoto data structure. Could you provide also the compiler-generated assembly of this example? This would help us make sure we implement the right fix. I think the problem is in this rule https://github.com/GrammaTech/ddisasm/blob/main/src/datalog/binary/elf/elf_binaries.dl#L217 I will look into this further.

issue 54.5: 416.gamess

I can't open this .tar.gz file either, can you re-upload?

witbring commented 1 year ago

Thank you for your reply.

I checked the uploaded files but there are no problem to unpack the tar files. Thus, I compress them with a different format, just in case. Also, I upload an assembly file you asked. I hope it will help.

issue 54.3: nm_new

nm_new.zip

issue 54.4: date

parse-datetime.s.txt is a relevant assembly file that a compiler generated. You'd better to check a line number 4474.

issue 54.5: 416.gamess

416.gamess.zip (delete link)

aeflores commented 1 year ago

issue 54.3: nm_new

For nm_new, I can successfully untar or unzip, but the file inside is only 29 bytes and does not seem to have a binary format.

issue 54.4: date

Thanks, this is useful. I'll let you know once a fix is in.

issue 54.5: 416.gamess

This also seems to work fine on the current main branch fa15bff2748fc5c2daccd7380fe852e2c2c5d90f

main:
            subq $8,%rsp
            callq _gfortran_set_args@PLT

            leaq .L_61f110(%rip),%rsi
witbring commented 1 year ago

Sorry, I re-upload nm_new. nm-new.zip

aeflores commented 1 year ago

Hi @witbring, thanks! I can confirm that nm_new is still an issue. We will work on it.

aeflores commented 1 year ago

Alright, issue 54.3: nm_new should be solved by 10d66dab9eb85932e63615f0affa6ddf134bee94

adamjseitz commented 1 year ago

Hi @witbring, I am looking at resolving the remaining issues.

I think I have a fix for the date binary, but I am trying to generate some additional smaller examples, but I am having trouble getting a compiler to produce similar code to that assembly output.

Can you provide any information about your build environment and how you build coreutils to generate this code? I believe from the artifacts you have attached that you're using clang 12 to build coreutils-8.30 (x86 pie). Because the C file parse-datetime.c is generated, maybe your yacc/bison version is relevant? The output of running ./configure command on coreutils might be helpful.

Thanks!

witbring commented 1 year ago

Hi @adamjseitz,

I'm grade to hear that you fixed the error. I compiled the date binary with -O1 -pie -fPIE -m32 options. Also, you can find my build environment and other configurations from config.log. I hope it helps your job.

Thank you.

adamjseitz commented 1 year ago

The true binary should be fixed by 06fe6fa.