Improve X64 jump table handling

Hi, thanks for your contribution and hard work! Retrowrite is amazing.

Actually, I find a small unsoundness issue in reassembly and want to set up some discussion here. It would be very appreciated if anyone can comment on this.

In short, my key insight is that: although we do not need to distinguish numerical numbers and references/labels in PIE binaries, we still need to distinguish numerical numbers and the label offsets.

I use the latest commit 9e2e633e9ab165681733f3255e648a62b22e6368 for reference.

Case 1

The story begins when I got a program which behaves differently after reassembly-and-recompilation (the attached code is reduced for easy demonstration).

#include <stdio.h>

static const int t[2] = {-187, -184};

int main() {
    int x;
    scanf("%d", &x);
    printf("%d\n", t[x]);
}

My basic setup is:

$ gcc --version 
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc poc.c

$ file a.out
a.out: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=f1c5b51f8f0547a7d9e06fdc52bef7d450d34022, not stripped

$ retrowrite -s a.out a.s

$ gcc -no-pie a.s -o b.out
[*] Relocations for a section that's not loaded: .rela.dyn
[*] Relocations for a section that's not loaded: .rela.plt
[x] Could not replace value in .init_array
[x] Couldn't find valid section 200db0
[x] Couldn't find valid section 200fd8
[x] Couldn't find valid section 200fe0
[x] Couldn't find valid section 200fe8
[x] Couldn't find valid section 200ff0
[x] Couldn't find valid section 200ff8

After that, we can execute a.out and b.out to get the execution results.

$ ./a.out
1
-184

$ ./b.out
1
-202

We can see, with the same input 1, a.out prints -184 but b.out prints -202.

Originally, I think the fault is caused by the plt warnings. But after some exploration, I found it is an unsoundness issue in theory. Specifically, let's check the reassembly file a.s

...

.globl t_818
t_818: # 818 -- 820                    # static const int t[2] locates here
.LC818:
        .long .LC75d-.LC818
.LC81c:
        .long .LC760-.LC818

...

.LC756:
        leaq .LC818(%rip), %rax
.LC75d:
        movl (%rdx, %rax), %eax
.LC760:
        movl %eax, %esi
.LC762:
        leaq .LC823(%rip), %rdi
.LC769:
        movl $0, %eax
.LC76e:
        callq printf@PLT

...

We can see, retrowrite misclassified the numerical elements in const int t[2] as the label offsets (e.g., .LC75d-.LC818). After compilation, these values are changed for sure.

Then I go check the code.

https://github.com/HexHive/retrowrite/blob/9e2e633e9ab165681733f3255e648a62b22e6368/librw/rw.py#L219

It seems that retrowrite uses heuristics to symbolize jump tables (which contains many label offsets). And in this case, the global const int t[2] satisfies the heuristics by chance, which confuses retrowrite. Unsoundness is still here and there, somehow.

With more study, I think the problem can be summarized as:

Although we do not need to distinguish numerical numbers and references in PIE binaries, we still need to distinguish numerical numbers and the offsets between labels. I feel these label offsets are not in the symbol/relocation table.

The aforementioned case provides an example, where the element in jump table is the offset between labels (the target and jump base). That is why we got confused here.

Case 2

To further study the root cause, I hand-crafted a program with inline-assembly. Hope it can help.

test rdi, rdi;
je A;

mov rbx, B-A;
push rbx;
lea r8, [rip + A];

A:
pop r9;
add r8, r9;
jmp r8;

B:
mov rax, 60;     # SYS_EXIT
mov rdi, 0;
syscall;
ret;

Let's go through the code.

test rdi, rdi;
je A;

The first test-je pattern is to let retrowrite know there is a basic block starting at A.

mov rbx, B-A;
push rbx;
lea r8, [rip + A];

The mov instruction is the key, which loads the offset between labels B and A into rbx. The following push instruction stores the offset into memory, and the lea instruction loads the address of label A into r8.

A:
pop r9;
add r8, r9;
jmp r8;

Later, we pop the offset into r9. Note that, here we use a simple push-pop pattern to simulate the complex behaviors (including the aliasing problems) in real-world binary. The add instruction adds r8 and r9, which denotes A + B - A = B. Then, an indirect jump leads the control flow to B.

B:
mov rax, 60;     # SYS_EXIT
mov rdi, 0;
syscall;
ret;

B is a simple exit(0).

My basic setup is:

$ cat poc.c
int main(int argc, char **argv) {
    asm volatile(
        ".intel_syntax noprefix\n"

        "\ttest rdi, rdi;\n"
        "\tje A;\n"

        "\tmov rbx, B-A;\n"
        "\tpush rbx;\n"
        "\tlea r8, [rip + A];\n"

        ".global A\n"
        "A:\n"
        "\tpop r9;\n"
        "\tadd r8, r9;\n"
        "\tjmp r8;\n"

        ".global B\n"
        "B:\n"
        "\tmov rax, 60;\n"
        "\tmov rdi, 0;\n"
        "\tsyscall;\n"

        "\tret;\n"

        ".att_syntax;\n"
        );
}

$ gcc poc.c

$ retrowrite -s a.out a.s
[*] Relocations for a section that's not loaded: .rela.dyn
[x] Could not replace value in .init_array
[x] Couldn't find valid section 200df8
[x] Couldn't find valid section 200fd8
[x] Couldn't find valid section 200fe0
[x] Couldn't find valid section 200fe8
[x] Couldn't find valid section 200ff0
[x] Couldn't find valid section 200ff8

$ AFL_AS_FORCE_INSTRUMENT=1 ~/AFLplusplus/afl-clang -no-pie a.s -o b.out

Let's first check the reassembly file

...

.LC605:
        testq %rdi, %rdi
.LC608:
        je .L619
.LC60a:
        movq $8, %rbx                   # Originally, it is the offset between labels B and A, misclassified as a numerical number 8
.LC611:
        pushq %rbx
.LC612:
        leaq (%rip), %r8
.L619:
.LC619:

...

We can see retrowrite left the label offset (B-A) as a constant number 8.

After instrumentation, the offset has changed, but the constant is still left here, which breaks the recompiled binary (the indirect jump target becomes invalid).

$ ./a.out

$ ./b.out
[1]    4754 segmentation fault (core dumped)  ./b.out

The solution is to infer the label offset (B - A). However, the traditional challenge, which is caused by sophisticated memory behaviors, is still there.

I have attached the above files. The directory structure is:

- case1    # the printf case
    - poc.c
    - a.out
    - a.s
    - b.out
- case2    # the inline-asm case
    - poc.c
    - a.out
    - a.s
    - b.out

It seems that an Usenix paper also mentions the small unsoundness issue in Section 7.1. It would be very appreciated if anyone can share some thinkings here. And also, I hope our discussion can help the development of retrowrite, which is, again, a great work for us to follow.

Thanks!

Hi, thank you for opening this issue. I appreciate how you included many detailed information and already spotted yourself the source of the problem on case 1.

I'm extremely sorry for taking such a long time to answer, but I was taking some time off and no one else was around to answer your question. You raise some interesting points, I'll try to address them all:

You are 100% right about the source of the jump table detection algorithm on case 1. The x86 jump table algorithm is missing some important steps and is currently not precise. We're in the process of fixing it, but thanks anyway for pointing this out.
Regarding your case 2: Retrowrite does not support binaries generated by non-well behaved compilers (such as a packed binary), and we don't support inline assembly too, like many other static rewriters. The example you provide is very on-point and shows perfectly the shortcomings of this approach. I'm sorry if that was not clear in the README: we're in the process of refactoring the entire codebase and I will remember to better explain this in the README.

the traditional challenge, which is caused by sophisticated memory behaviors, is still there.

What do you mean by "traditional challenge"? Are you talking about distinguishing between code and data? If yes, then I believe the point here it's that it's not solved in the general case, but only on binaries outputted by well-behaved compilers (gcc, clang).

It seems that an Usenix paper also mentions the small unsoundness issue in Section 7.1.

Thank you for linking this paper! It was an interesting read. So while we don't plan on supporting the ICC compiler in the future (as it's not very commonly used), the point they bring up about not being able to correctly symbolize jump tables is interesting to discuss. While it's true that there is no relocation data to specify which parts of the binary's memory are jump tables, we believe that using techniques such as data flow analysis to recognize patters used by compilers only when generating jump tables. The approach to detecting jump tables would be to, for every indirect jump on a register, check if that register comes from an indexed load from memory with an upper bounds check on the index.

Please keep in mind that I'm not the original author of Retrowrite, I'm just the guy who is adding support for aarch64, so take everything I've said with a pinch of salt as I may be wrong. The data flow approach I mentioned above was implemented in the aarch64 version of Retrowrite (because jump table detection with heuristics there is impossible), and it works on the hundreds of binaries we tried (including big binaries such as gcc); I believe that using the same approach on x86 would solve this issue. I can see how some people could think that data flow analysis is just another heuristic; but I believe that data flow analysis is exact, and does not need relying on scores/probability to be able to work - it needs to rely on a few assumptions though (such as, we're working with a well-behaved compiler).

Thank you again for raising those points above, and for your interest in the project. I'm keeping the issue open for further discussion and as long as x86 jump table detection does not get fixed.

Hi @cyanpencil ,

Thanks for your reply!

What do you mean by "traditional challenge"? Are you talking about distinguishing between code and data? If yes, then I believe the point here it's that it's not solved in the general case, but only on binaries outputted by well-behaved compilers (gcc, clang).

I mean the symbolization of numerical values, that is, to distinguish between scalars and reference offsets statically. For example, in a PIC binary, given a number 8 (see case 2), we want to decide whether it is a scalar or an offset between references (i.e., reference - reference). It is still a undecidable/unsolved problem even for PIC binaries, because such offsets (unlike reference itself) do not have symbols in most cases.

Jump table is just a common case of using reference offsets.

Like you said, to solve this challenge, we still need some data-flow analysis. But in general cases, data-flow analysis cannot be accurate due to many factors (both false positives and false negatives will influence reassembly). Note that the general data-flow analysis relies on point-to/alias analysis, which is another undecidable problem.

For general cases, to be honest, the only solution I can think of is to introduce heuristics (like current jump table identification for x64). Please kindly correct me if I am wrong.

it needs to rely on a few assumptions though (such as, we're working with a well-behaved compiler).

In the original paper, I see "For PIC, we adopt a principled symbolization strategy without heuristics". So I originally thought it works in general cases.

But yes, if we are shooting on well-behaved compilers, all the problems should have gone, lol.

I've thought for a while. I somehow feel it is still heuristical. That is, our heuristics are perfect fit for commonly-used compilers (e.g., gcc, icc, and clang). Based on my understanding, using commonly-used compilers means there are some fixed patterns or other pre-known constraints.

I believe that data flow analysis is exact, and does not need relying on scores/probability to be able to work

I think relying on scores/probability is just one kind of heuristic approach. Another is, like what we might do with well-behaved compilers, relying on some restricted code patterns (e.g., memory accesses for jump table are simple enough).

But anyway, I believe we can definitely do better within such a scope (for commonly-used compilers).

Thank you for clarifying some of the things I did not understand before.

I agree with you; I believe Retrowrite is not a breakthrough in static analysis; it is a simple, easy-to-hack rewriter that shows how we can rewrite COTS binaries in around 3k lines of python, without needing to use complex static analysis (because we restrict ourselves to COTS).

In the original paper, I see "For PIC, we adopt a principled symbolization strategy without heuristics". So I originally thought it works in general cases.

You are definitely not wrong, and I think both the paper and the README should be more specific and make this point more clear to avoid any kind of confusion about this. I don't think the claim "we symbolize PIC without heuristics" in the paper is inherently wrong, as the paper only talks about COTS binaries (so, well-behaved compilers); but I do believe that we should have explicitly said that all the techniques used would not work in the general case, and why. I like the way you put it: the approach is heuristical in nature, but it's a perfect fit for COTS binaries.

But anyway, I believe we can definitely do better within such a scope (for commonly-used compilers).

That's true; while I personally won't have time right now to fix this issue as I'm super busy with the aarch64 version of Retrowrite, but hopefully I will get to fix jump tables on x86 sooner or later

HexHive / retrowrite

Improve X64 jump table handling #20

Case 1

Case 2

More