SimonKagstrom / kcov

Code coverage tool for compiled programs, Python and Bash which uses debugging information to collect and report data without special compilation options
http://simonkagstrom.github.io/kcov/
GNU General Public License v2.0
710 stars 109 forks source link

Testing the new mach-engine with Go #405

Closed myzhan closed 11 months ago

myzhan commented 1 year ago
$ mkdir build && cd build
$ cmake -G Xcode -DOPENSSL_ROOT_DIR=/usr/local/opt/openssl -DOPENSSL_LIBRARIES=/usr/local/opt/openssl/lib ..
$ xcodebuild -target kcov -configuration Release
note: Run script build phase 'CMake PostBuild Rules' will be run during every build because the option to run the script phase "Based on dependency analysis" is unchecked. (in target 'kcov' from project 'kcov')
PhaseScriptExecution CMake\ PostBuild\ Rules /Users/zhanqp/github/kcov/build/build/kcov.build/Release/Script-8BEB10D058631033A81B2400.sh (in target 'kcov' from project 'kcov')
    cd /Users/zhanqp/github/kcov
    /bin/sh -c /Users/zhanqp/github/kcov/build/build/kcov.build/Release/Script-8BEB10D058631033A81B2400.sh
Signing the kcov binary with an ad-hoc identity
kcov: No such file or directory
Command PhaseScriptExecution failed with a nonzero exit code

note: Run script build phase 'Generate CMakeFiles/ZERO_CHECK' will be run during every build because the option to run the script phase "Based on dependency analysis" is unchecked. (in target 'ZERO_CHECK' from project 'kcov')
** BUILD FAILED **

The following build commands failed:
    PhaseScriptExecution CMake\ PostBuild\ Rules /Users/zhanqp/github/kcov/build/build/kcov.build/Release/Script-8BEB10D058631033A81B2400.sh (in target 'kcov' from project 'kcov')
(1 failure)

After manually signing the binary, It failed to collect coverage from a go binary.

$ ./src/Release/kcov $HOME/tmp/report $HOME/tmp/hello Picked parser: Mach-O warning: no debug symbols in executable (-arch x86_64)

SimonKagstrom commented 1 year ago

If there are no debug symbols, it will not work. Normally, you pass -g to the compiler, but I'm not sure how go does it?

The other issue seems to be with xcode. Perhaps it would be better to build it with make or ninja, i.e.,

cmake -G make -DOPENSS...

Does that work?

myzhan commented 1 year ago

For the building issue, I switch to make and it's ok. And I think the version of dsymutil does not support go executable, which I installed via brew. Do we need a new go-parser for go executables?

SimonKagstrom commented 1 year ago

Good, I'll update the build instructions for OSX.

If you comment out the "system("dsymutil...");" stuff in macho-parser.cc, does it work with Go then?

myzhan commented 1 year ago

No, it doesn't. Can I reopen the previous issue?

myzhan commented 1 year ago

OK, I manage to make it work with Go, and the new engine is as fast as ptrace in linux.

  1. remove dsymutil and let it parse the executable directly
  2. remove the check for hdr->filetype != MH_DSYM
  3. remove is_end_seq
  4. read and write the addr without adding m_imageBase
myzhan commented 1 year ago

I wrote a tool in Go to help me test the parsing procedure, and found that is_end_seq will make us missing the last address of each compile unit.

package main

import (
    "debug/dwarf"
    "debug/macho"
    "fmt"
    "io"
)

func panicIfErr(err error) {
    if err != nil {
        panic(err)
    }
}

func main() {
    executable := "/Users/zhanqp/tmp/hello"
    exe, err := macho.Open(executable)
    panicIfErr(err)
    dw, err := exe.DWARF()
    panicIfErr(err)
    reader := dw.Reader()
    for index := 0; ; index++ {
        entry, err := reader.Next()
        panicIfErr(err)
        if entry == nil {
            break
        }
        if entry.Tag != dwarf.TagCompileUnit {
            continue
        }
        lrd, err := dw.LineReader(entry)
        panicIfErr(err)
        for {
            var e dwarf.LineEntry
            err := lrd.Next(&e)
            if err == io.EOF {
                break
            }
            panicIfErr(err)
            fmt.Printf("%s:%d %x\n", e.File.Name, e.Line, e.Address)
        }
    }
}
SimonKagstrom commented 1 year ago

Good work!

is_end_seq actually needed for C code, since clang leaves some data for use in functions at the end of the function. Without is_end_seq, kcov sees that as a line of code (since it looks like that in the DWARF info), and tries to set a breakpoint there, thereby corrupting the data.

What is the filetype for the Go binaries? If one can identify them some way, it would be fairly easy to skip is_enq_seq and dsymutil etc for that case.

myzhan commented 1 year ago

BTW, I reverted my changes and test a simple C binary, it failed to parse debug symbols.

$ brew install dwarfutils
$ cat hello.c
#include <stdio.h>

int main() {
    printf("hello\n");
    return 0;
}
$ clang --version
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: x86_64-apple-darwin22.1.0
Thread model: posix
$ clang -g -O0 hello.c -o hc
$ dsymutil ./hc
warning: (x86_64) /var/folders/d4/8t_88ltn76l44lv65wnx3q9c0000gp/T/hello-19fac8.o unable to open object file: No such file or directory
warning: no debug symbols in executable (-arch x86_64)
myzhan commented 1 year ago

What is the filetype for the Go binaries?

The linker of Go stores build info in executable, can will search for this magic string?

https://github.com/golang/go/blob/e68c027204d410ebca5bf1a9660605f2bd737748/src/debug/buildinfo/buildinfo.go#L49C2-L49C16

SimonKagstrom commented 1 year ago

If I remember correctly, clang uses a special case when building in one step, i.e., without compile + link. Then I think it might store the debug symbols in the binary directly, otherwise dsymutils will gather everything from the object files.

Perhaps that's how Go does it as well?

Then I think there should be some special-case to catch this. Not quite sure how to determine it though.

SimonKagstrom commented 1 year ago

Can I look at your stashed changes? I tried to follow your steps (with a hello world compiled directly to an executable), and I just get a segfault when parsing it.

myzhan commented 1 year ago

See this commit https://github.com/myzhan/kcov/commit/89ddc6150d03a58ce979892642addc63ee9d0219

SimonKagstrom commented 1 year ago

Thanks! I did basically the same changes yesterday, but still with a crash during parsing. However, it looks to me that the hello world binary still doesn't have the DWARF info embedded, so maybe there is still some other way used compared to how Go does it.

So basically your hello world example, compiled in the same way. Is that parseable with kcov, with your commit?

myzhan commented 1 year ago

By default, go will compress dwarf sections, you need to disable it when building the executable. Please refer to my previous issue for the command.

Simon Kagstrom @.***>于2023年8月25日 周五00:29写道:

Thanks! I did basically the same changes yesterday, but still with a crash during parsing. However, it looks to me that the hello world binary still doesn't have the DWARF info embedded, so maybe there is still some other way used compared to how Go does it.

So basically your hello world example, compiled in the same way. Is that parseable with kcov, with your commit?

— Reply to this email directly, view it on GitHub https://github.com/SimonKagstrom/kcov/issues/405#issuecomment-1692030762, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHI3NIAS2QP7QQ6YFG5QMDXW56PLANCNFSM6AAAAAA3ZW5XKI . You are receiving this because you authored the thread.Message ID: @.***>

SimonKagstrom commented 1 year ago

Yes, but I meant your hello world in c

myzhan commented 1 year ago

Ah, yes, I get the same segmentation fault when parsing c executable.

myzhan commented 1 year ago

The crash is in dwarf_errmsg(). The error is DW_DLV_NO_ENTRY.

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x00000001002b927f libdwarf.0.dylib`dwarf_errmsg + 9
    frame #1: 0x0000000100047a31 kcov`(anonymous namespace)::MachoParser::parse(this=0x00000001000c5278) at macho-parser.cc:135:13 [opt]
    frame #2: 0x00000001000063ff kcov`Collector::run(this=0x00000001004063d0, filename=Summary Unavailable) at collector.cc:46:16 [opt]
    frame #3: 0x000000010002db0a kcov`runKcov(runningMode=MODE_COLLECT_AND_REPORT) at main.cc:353:19 [opt]
    frame #4: 0x000000010002ae60 kcov`main(argc=3, argv=<unavailable>) at main.cc:611:9 [opt]
    frame #5: 0x00007ff812573310 dyld`start + 2432
myzhan commented 1 year ago

After some digging, I realize that clang stores dwarf info into .dSYM instread of executable, that's why we can't directly parse the executable generated by clang. Apparently, the Go team doesn't follow the design.

See also:

  1. https://wiki.dwarfstd.org/Apple%27s_%22Lazy%22_DWARF_Scheme.md
  2. https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable/12827463#12827463
SimonKagstrom commented 1 year ago

Yes, I saw the same backtrace myself.

The dSYM stuff is the reason for the dsymutil invocation. The go scheme is more like what e.g., Linux etc do it with ELF, but I'm not sure how the trivial case with the compile-direct-to-executable with clang actually works. lldb is able to debug it, so there is some way, but I'm not sure what...

myzhan commented 11 months ago

So far, I can use https://pkg.go.dev/golang.org/x/tools@v0.14.0/cmd/splitdwarf to generate dSYM from go executeble before running kcov. I think the macho parser can check the dSYM is valid, before running dsymutil, nor it will report that "no debug sysbols in executable" and overwrite previous file generate by splitdawrf.

And Go has a proposal to generate dSYM directly by linker, https://github.com/golang/go/issues/62577

myzhan commented 11 months ago

BTW, I still need to comment out "m_imageBase" when running go executable.

SimonKagstrom commented 11 months ago

OK, thanks for the progress!

I haven't looked at the mach-o stuff for a while, so I'm not don't know how to check the validity of the dSYM info. I guess there should be some checksum to match against the binary, but I'm not sure.

I guess we can check for the buildinfo string to identify go binaries, and then special case them. Alternatively, perhaps it's easier to just add a command-line option which does it for now?

E.g., with --parse-go-binary (or finding it directly) it would

I guess that would work both for the splitdwarf variant and the produced-by-linker case?

myzhan commented 11 months ago

I prefer to find it directly by search the binary for the buildinfo magic string, which is "\xff Go buildinf:". If you are ok with that, I can submit a PR later.

SimonKagstrom commented 11 months ago

Yes, absolutely! Must the entire binary be parsed, or is it placed in a known place?

myzhan commented 11 months ago

Only for the macho engine, I will try to use the dwarf parser to locate the "__go_buildinfo" section, and search for the magic string in it, just like the Go std.

myzhan commented 11 months ago

I have submitted a PR. And I can't reproduce the problem of is_seq_end, so I keep it.