`split_byte_interval` and `join_byte_intervals` break jump tables in instrumented binaries

avncharlie commented 5 months ago

Instrumentation I have been writing for x64 Windows binaries has been breaking on any complex program due to faulty indirect jumps, coming from broken switch jump tables or C++ jump tables.

I believe this is because the split_byte_interval function called before rewriting separates applies an alignment directive to each DataBlock before separating each DataBlock to its own ByteInterval: https://github.com/GrammaTech/gtirb-rewriting/blob/1e116e649ed87fc4f1beb2ba9dc341a046d856d6/gtirb_rewriting/intervalutils.py#L94-L98

After rewriting, join_byte_intervals may add null bytes to a DataBlock to implement this alignment: https://github.com/GrammaTech/gtirb-rewriting/blob/1e116e649ed87fc4f1beb2ba9dc341a046d856d6/gtirb_rewriting/intervalutils.py#L229-L231

This means that depending on how added instrumentation effects alignment, padding bytes may be added between DataBlocks in the instrumented binary. This breaks any kind of jump table or structure in memory that depends on items having a fixed relative offset from each other.

Below is a minimal example. I compiled the below program using x64 MSVC as such: cl /Zi .\main.c /Feswitch64.exe

#include <stdio.h>

void execute_case(int case_num) {
    switch(case_num) {
        case 0:
            printf("Case 0\n");
            break;
        case 1:
            printf("Case 1\n");
            break;
        case 2:
            printf("Case 2\n");
            break;
        case 3:
            printf("Case 3\n");
            break;
        case 4:
            printf("Case 4\n");
            break;
        case 5:
            printf("Case 5\n");
            break;
        case 6:
            printf("Case 6\n");
            break;
        case 7:
            printf("Case 7\n");
            break;
        case 8:
            printf("Case 8\n");
            break;
        case 9:
            printf("Case 9\n");
            break;
        default:
            printf("Default case\n");
            break;
    }
}

int main() {
    for (int i = 0; i < 11; i++) {
        execute_case(i);
    }
    return 0;
}

In the compiled binary, the generated switch table is positioned immediately after the execute_case function. I added the following instrumentation to last block of this function. The binary I compiled and used for this test is here: switch64.instrumented.exe.zip

class BreakSwitchPass(Pass):
    def begin_module(self, module, functions, rewriting_ctx):
        for x in module.code_blocks:
            if x.address == 0x14000721c:
                rewriting_ctx.register_insert(
                    SingleBlockScope(x, BlockPosition.ENTRY),
                    Patch.from_function(lambda _:f'''
                    nop
                    nop
                    nop
                    nop
                    ''', Constraints(x86_syntax=X86Syntax.INTEL))
                )

This is what the generated (broken) binary looks like: Compared to the original:

This is the switch table in the generated assembly, that contains the padding bytes breaking the jump table:

I have found a temporary workaround is commenting out the lines in split_byte_interval that apply alignment. As my binaries don't appear to have any alignment information in their IR to begin with, this doesn't cause any issues and fixes the problem.

A proper fix might involve modifying split_byte_interval to never split contiguous DataBlocks into different ByteIntervals. This documentation: https://grammatech.github.io/gtirb/python/gtirb.byteinterval.html states: "If two blocks are in two different ByteIntervals, then it should be considered safe (that is, preserving of program semantics) to move one block relative to the other in memory". I think it is a fair assumption that contiguous DataBlocks should maintain the same relative offsets to each other post instrumentation, and with the previous documentation in mind this would mean they shouldn't be split into seperate ByteIntervals.

avncharlie commented 5 months ago

Accidentally created empty issue, updated now with information.

avncharlie commented 5 months ago

I should note that even if join_byte_intervals doesn't add null bytes, the jump table will still be aligned in such a way that a gap will be inserted between jump table entries and still break the jump table

jranieri-grammatech commented 4 months ago

Thanks for filing this. The current behavior is definitely wrong in two ways:

gtirb-rewriting should stop generating padding bytes ever and let gtirb-pprinter do that, since gtirb-rewriting does not know what the final layout of the module will be
gtirb-rewriting's heuristics for inferring block alignment should be re-evaluated due to situations like this and the AVX issue you previously filed

I don't have any cycles to work on this currently, but a temporary workaround is to pre-process the IR before rewriting and add explicit alignment for those blocks in advance (aligned to a 4-byte boundary). That should prevent rewriting from inserting the problematic padding bytes.

GrammaTech / gtirb-rewriting

`split_byte_interval` and `join_byte_intervals` break jump tables in instrumented binaries #15