GrammaTech / ddisasm

A fast and accurate disassembler
https://grammatech.github.io/ddisasm/
GNU Affero General Public License v3.0
647 stars 59 forks source link

[BINARY] fails disassembly #20

Closed ZhangZhuoSJTU closed 3 years ago

ZhangZhuoSJTU commented 3 years ago

Hi, I have got some binaries which is failed to be recompiled.

My ddisasm is apt-installed on Ubuntu 18.04, and the version information is:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.5 LTS
Release:        18.04
Codename:       bionic

$ ddisasm --version
1.2.0 (2a4d260 2020-11-13)

Generally speaking, there are two types of errors.

1. The conflicts between the following code blocks could not be resolved

These errors occurs in boringssl, openssl-1.0.1f, and openssl-1.1.0c.

Taking boringssl as an example:

$ ddisasm --asm boringssl-2016-02-12.normal.s  --no-cfi-directives boringssl-2016-02-12.normal
Building the initial gtirb representation  (20ms)
Decoding the binary  (1s)
Disassembling (8520s)
Populating gtirb representation WARNING: Found integral symbol pointing into existing block: .L_4c6c44
 (2s)
Computing intra-procedural SCCs  (68ms)
Computing no return analysis  (44s)
Detecting additional functions  (7s)
Printing assembler WARNING: found overlapping element at address 420beb
The --layout option to gtirb-pprinter can fix overlapping elements.
WARNING: found overlapping element at address 420bec
The --layout option to gtirb-pprinter can fix overlapping elements.
WARNING: found overlapping element at address 420bed
The --layout option to gtirb-pprinter can fix overlapping elements.
 (3s)
The conflicts between the following code blocks could not be resolved:
420bea - 420beb
420bea - 420bec
420bea - 420bed
420bea - 420bee
420beb - 420bec
420beb - 420bed
420beb - 420bee
420bec - 420bed
420bed - 420bee
Aborting

I have done some exploration. It seems all these three binaries contain inlined data (e.g., some crypto stuff contained by hand-written assembly). And I found those conflicting addresses are actually the interleaving data, which may indicates ddisasm somehow misclassified data as code.

2. Segment fault for the recompiled binary

There are other three binaries, including lcms, libxml2, and wpantund.

Taking libxml2 as an example

$ ddisasm --asm libxml2-v2.9.2.normal.s --no-cfi-directives libxml2-v2.9.2.normal
Building the initial gtirb representation  (19ms)
Decoding the binary  (4s)
Disassembling (10322s)
Populating gtirb representation WARNING: Found integral symbol pointing into existing block: .L_50e654
 (13s)
Computing intra-procedural SCCs  (312ms)
Computing no return analysis  (346s)
Detecting additional functions  (102s)
Printing assembler  (3s)

$ gcc -no-pie libxml2-v2.9.2.normal.s -o libxml2-v2.9.2.normal.ddisasm -lm -lpthread -lz

$ ./libxml2-v2.9.2.normal seed    # exit without fault

$ ./libxml2-v2.9.2.normal.ddisasm seed
[1]    1576 segmentation fault (core dumped)  ./libxml2-v2.9.2.normal.ddisasm seed

More

I have attached all the six binaries. In each subdirectory, there is a cmd_line.txt file describing the problem.

I really want to provide the non-stripped version of these binaries, but it seems that they got lost somehow. I do apologize for the inconvenience. All of them come from Google FTS compiled with -O2.

Please let me know if there is anything I can help with. Thanks!

ZhangZhuoSJTU commented 3 years ago

More information. When I compiled the google fuzzer test suite, I add the following code to make a libfuzzer-based program into a normal program.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

extern "C" {
    int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size);
    __attribute__((weak)) int LLVMFuzzerInitialize(int *argc, char ***argv);
}

int main(int argc, char** argv)
{
        if(argc <= 1)
        {
                exit(1);
                printf("argc error!\n");
        }
        size_t length, result;
        unsigned char* buf;
        FILE *fp = fopen(argv[1], "rb");
        if(fp == NULL)
        {
                exit(1);
                printf("Open error!\n");
        }
        fseek(fp, 0, SEEK_END);
        length = (size_t)ftell(fp);
        rewind(fp);
        buf = (unsigned char*)malloc(length);
        if(buf == NULL)
        {
                printf("malloc error!\n");
                exit(2);
        }
        result = fread(buf, 1, length, fp);
        if(result != length)
        {
                printf("read error!\n");
                exit(3);
        }
        LLVMFuzzerTestOneInput(buf, result);
        free(buf);
        fclose(fp);
        return 0;
}
ZhangZhuoSJTU commented 3 years ago

Hi, after additional checking, I find another failure from Google FTS. This binary does not contain hand-written assembly.

json.tar.gz

$ ddisasm --asm json-2017-02-12.normal.s --no-cfi-directives json-2017-02-12.normal
Building the initial gtirb representation  (6ms)
Decoding the binary  (252ms)
Disassembling (541s)
Populating gtirb representation  (271ms)
Computing intra-procedural SCCs  (6ms)
Computing no return analysis  (41s)
Detecting additional functions  (16s)
Printing assembler  (65ms)

$ gcc -no-pie -o json-2017-02-12.normal.ddisasm json-2017-02-12.normal.s -lstdc++

$ ./json-2017-02-12.normal seed && echo $?
0

$ ./json-2017-02-12.normal.ddisasm seed
terminate called after throwing an instance of 'std::invalid_argument'
  what():  parse error - unexpected '\'; expected end of input
[1]    201 abort (core dumped)  ./json-2017-02-12.normal.ddisasm seed

By the way, if the binary with debug information can provide much help, please let me know. I am trying to find it in some old machines, but not sure whether I can make it at the end... So sorry for that.

junghee commented 3 years ago

Hi ZhangZhuoSJTU,

Thank your for reporting the issue.

We are still working on the first issue about overlapping blocks. We will keep you updated when it gets fixed.

For the seg-faulting binaries, here is a quick update:

libxml2: It runs fine with our latest version of ddisasm. We haven't tracked down which commits fix the problem, but hopefully, the problem will be gone in the next release of ddisasm.

lcms: We were able to track it down to a missing symbolic operand in data at 0x641c08. Ideally, it should have been symbolized as .quad .L_421820. However, the current version of ddisasm fails on it. We are still working on fixing the problem.

wpantund: We have fixed the problem internally. The problem was that a function was not printed out as a function with function directives, and so that function was not aligned properly. The related commit hashes are c3188480221b8f7915d5d55484cde780d190e327 e20096eeddae8dc51a542df979e390e84e467fea

json: It appears that we still have some issues in symbolizing .eh_frame sections. We will be working on resolving the issues. Meanwhile, if you do not pass --no-cfi-directives, you will find it runs fine.

aeflores commented 3 years ago

Hi @ZhangZhuoSJTU, I believe lcms should be fixed now (since commit 53c8e6f1058a71b792f3ac8e0c5b3693c912acaa). boringssl, openssl-1.0.1f, and openssl-1.1.0c are still work in progress.

eschulte commented 3 years ago

Closing this for now. Please reopen if the problem persists.