compiler-explorer / asm-parser

BSD 2-Clause "Simplified" License
27 stars 7 forks source link

Maps assembly back to source code #35

Closed Anarion-zuo closed 11 months ago

Anarion-zuo commented 11 months ago

Compiler puts debug info in the binary, through e.g. -g. I see the output json has a "source" attribute. Is that supposed to be pointing to a position in the original source code file? I only get output that shows "source": null, and wonder how it may actually print a reference to the source code file here.

My C code:

#include <stdio.h>

int main() {
    printf("hihihi\n");
    return 0;
}

Compile command: (with clang compiled from LLVM master branch

$llvm_home/bin/clang hello.c -v -o hello -g -O2

How I invoke asm-parser:

objdump -S -d hello -l --insn-width=16 > objdump.asm
cat objdump.asm | ../asm-parser -stdin -binary -whitespace -library_functions > asm_parsed.json

I'm using asm-parser released here: https://github.com/compiler-explorer/asm-parser/releases/tag/v0.9

json output:

{
    "asm": [
        {
            "labels": [],
            "source": null,
            "text": "puts@plt:"
        },
        {
            "labels": [],
            "opcodes": [
                "ff",
                "25",
                "e2",
                "2f",
                "00",
                "00"
            ],
            "address": 4144,
            "source": null,
            "text": " jmpq   *0x2fe2(%rip)        # 4018 <puts@GLIBC_2.2.5>"
        },
        {
            "labels": [],
            "opcodes": [
                "68",
                "00",
                "00",
                "00",
                "00"
            ],
            "address": 4150,
            "source": null,
            "text": " pushq  $0x0"
        },
        {
            "labels": [],
            "opcodes": [
                "e9",
                "e0",
                "ff",
                "ff",
                "ff"
            ],
            "address": 4155,
            "source": null,
            "text": " jmpq   1020 <.plt>"
        },
        {
            "labels": [],
            "source": null,
            "text": "main:"
        },
        {
            "labels": [],
            "opcodes": [
                "50"
            ],
            "address": 4416,
            "source": null,
            "text": " push   %rax"
        },
        {
            "labels": [],
            "opcodes": [
                "48",
                "8d",
                "3d",
                "bc",
                "0e",
                "00",
                "00"
            ],
            "address": 4417,
            "source": null,
            "text": " lea    0xebc(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>"
        },
        {
            "labels": [
                {
                    "name": "puts@plt",
                    "range": {
                        "startCol": 15,
                        "endCol": 23
                    }
                }
            ],
            "opcodes": [
                "e8",
                "e3",
                "fe",
                "ff",
                "ff"
            ],
            "address": 4424,
            "source": null,
            "text": " callq  1030 <puts@plt>"
        },
        {
            "labels": [],
            "opcodes": [
                "31",
                "c0"
            ],
            "address": 4429,
            "source": null,
            "text": " xor    %eax,%eax"
        },
        {
            "labels": [],
            "opcodes": [
                "59"
            ],
            "address": 4431,
            "source": null,
            "text": " pop    %rcx"
        },
        {
            "labels": [],
            "opcodes": [
                "c3"
            ],
            "address": 4432,
            "source": null,
            "text": " retq   "
        },
        {
            "labels": [],
            "opcodes": [
                "66",
                "2e",
                "0f",
                "1f",
                "84",
                "00",
                "00",
                "00",
                "00",
                "00"
            ],
            "address": 4433,
            "source": null,
            "text": " nopw   %cs:0x0(%rax,%rax,1)"
        },
        {
            "labels": [],
            "opcodes": [
                "0f",
                "1f",
                "44",
                "00",
                "00"
            ],
            "address": 4443,
            "source": null,
            "text": " nopl   0x0(%rax,%rax,1)"
        }
    ],
    "labelDefinitions": {
        "puts@plt": 1,
        "main": 5
    },
    "parsingTime": 0
}

As shown in the json string, it only has "source": null attributes, while the output in objdump has source code interleaving with assembly code.

jeremy-rifkin commented 11 months ago

I don't think objdump -S -d hello -l --insn-width=16 has any debug info. It's just disassembly and hex. If you pass -S to clang and run the asm parser on that I'd expect it to work. Partouf will be the expert on all things asm parser though.

Anarion-zuo commented 11 months ago

So we can't get anything from objdump output, and must use compiler's output? What I need is a mapping from assembly code to source code, along with the opcodes and stuff. It would be better if mapping points to a location into the source code file, though I would be satisfied with a line in the source code without knowing its location.

I tried this, and got nothing from asm-parser.

$llvm_home/bin/clang -S -g -O2 -o hello.s hello.c
cat hello.s | ../asm-parser -stdin -whitespace > hello.s.json

What I got:

{"asm": [],"labelDefinitions": {}, "parsingTime": 1}

Then I tried this:

$llvm_home/bin/clang -S -g -O2 -o hello.s hello.c
cat hello.s | ../asm-parser -stdin -whitespace > hello.s.json

part of what I got:

        {
            "labels": [],
            "source": null,
            "text": "  .long 42 # DW_AT_type"
        },
        {
            "labels": [],
            "source": null,
            "text": "  .byte 0 # DW_AT_decl_file"
        },
        {
            "labels": [],
            "source": null,
            "text": "  .byte 4 # DW_AT_decl_line"
        },
        {
            "labels": [],
            "source": null,
            "text": "  .byte 3 # Abbrev [3] 0x2a:0xc DW_TAG_array_type"
        },
        {
            "labels": [],
            "source": null,
            "text": "  .long 54 # DW_AT_type"
        },
        {
            "labels": [],
            "source": null,
            "text": "  .byte 4 # Abbrev [4] 0x2f:0x6 DW_TAG_subrange_type"
        },

Perhaps @partouf can be so kindly to shed light on this?

To my best knowledge, the compiled binaries have debug info attached to them if the -g option is given at compile time. objdump can show it, and perhaps this project can parse it. So the problem is two-fold:

partouf commented 11 months ago

I think the issue here is probably the default clang -g, which probably uses dwarf 5, maybe?

Can you try compiling with -gdwarf-4 and then retry?

partouf commented 11 months ago

I don't think objdump -S -d hello -l --insn-width=16 has any debug info. It's just disassembly and hex. If you pass -S to clang and run the asm parser on that I'd expect it to work. Partouf will be the expert on all things asm parser though.

objdump always has debugging at the top of a sections that contains code as long as you compile with -g. Just not all objdumps and compilers and settings are compatible.

Anarion-zuo commented 11 months ago

It worked! You are my lifesaver!

During this time of anxiously waiting for your guidance, I have implemented a rather crude objdump parser of my own, nonetheless.

Another quick question if you care to answer. Is there a similar standalone tool for other assembly-ish language, e.g. LLVM IR?

jeremy-rifkin commented 11 months ago

We have some llvm parsing stuff in the main repo but no, not standalone,

jeremy-rifkin commented 11 months ago

I'll go ahead and close since this isn't an asmparser issue but feel free to ask anymore questions you have