SimonKagstrom / kcov

Code coverage tool for compiled programs, Python and Bash which uses debugging information to collect and report data without special compilation options
http://simonkagstrom.github.io/kcov/
GNU General Public License v2.0
720 stars 110 forks source link

program access invalid memory address. #340

Closed czlhs closed 3 years ago

czlhs commented 3 years ago

I have try the --verify flag, but it didn't work. My program receive SIGSEGV :

*** Aborted at 1605320405 (unix time) try "date -d @1605320405" if you are using GNU date ***
PC: @          0x1eeaf10 FixItemPosition()
*** SIGSEGV (@0x0) received by PID 51413 (TID 0x7fff99f71700) from PID 0; stack trace: ***
    @     0x7ffff53a65e0 (unknown)
    @          0x1eeaf10 FixItemPosition()
    @          0x1eee81c  Fetch()
    @          0x380ef3f execute_native_thread_routine
    @     0x7ffff539ee25 start_thread
    @     0x7ffff50cc34d __clone

I use the gdb to debug the core file :

....
[New LWP 192169]
[New LWP 53013]
[New LWP 98224]
[New LWP 53010]
Cannot access memory at address 0x7ffff7ffe128
Cannot access memory at address 0x7ffff7ffe120
Failed to read a valid object file image from memory.
Core was generated by `./a.out '.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000001eeaf10 in FixItemPosition () at ****
(gdb) bt
Python Exception <class 'gdb.MemoryError'> Cannot access memory at address 0x7fff99f3cef8:

disassemble at address 0x0000000001eeaf10:

   0x0000000001eeaeea <+74>:    lea    0x38(%rsp),%rax
   0x0000000001eeaeef <+79>:    mov    %rax,0x20(%rsp)
   0x0000000001eeaef4 <+84>:    lea    0x40(%rsp),%rax
   0x0000000001eeaef9 <+89>:    mov    %rax,0x18(%rsp)
   0x0000000001eeaefe <+94>:    xchg   %ax,%ax
   0x0000000001eeaf00 <+96>:    mov    0x0(%r13),%rdi
   0x0000000001eeaf04 <+100>:   test   %rdi,%rdi
   0x0000000001eeaf07 <+103>:   je     0x1eeb2dd
   0x0000000001eeaf0d <+109>:   mov    (%rdi),%rax
=> 0x0000000001eeaf10 <+112>:   callq  *0x30(%rax)
   0x0000000001eeaf13 <+115>:   mov    %rax,%r12
   0x0000000001eeaf16 <+118>:   data32 data32 data32 mov %fs:0x0,%rax
   0x0000000001eeaf22 <+130>:   xor    %edx,%edx

the code at the position is a static thread_local initialization, I run my program several times,It crash at different place but all the place are a static thread_local initialization. I don't know how to find the reason, Could someone give me some hint ?

SimonKagstrom commented 3 years ago

Well, I'm not aware of any case where kcov itself causes a crash in the covered program. However, it changes timing immensely, so if there are race conditions in a threaded program, it can be much more likely to "win" the race when run under kcov. Perhaps it's something like that in your program?

czlhs commented 3 years ago

Maybe there is some problem due to program run in slow speed, such as request timeout, but I can't find one of them, I'll follow up on the problem.

czlhs commented 3 years ago

I found out the reason cause the coredump. assume there are two compile units, bar.cc and foo.cc, they both have one inline function funcA, bar.cc was compiled by gcc5 , foo.cc was compiled by gcc8, the funcA's instruction and DWARF info in bar.o and foo.o are different. but after link, there will be only one funcA instruction, assume it's the funcA in bar.o. everything is right for bar.o's DWARF info. but for foo.o's DWARF, they point the invalid address, so if we set breakpoint on the address, coredump will happen.

here is an example, the normal asm code:

   0x00000000018b63c1 <+49>:    mov    $0x1,%eax
   0x00000000018b63c6 <+54>:    retq
   0x00000000018b63c7 <+55>:    nopw   0x0(%rax,%rax,1)
   0x00000000018b63d0 <+64>:    xor    %esi,%esi
   0x00000000018b63d2 <+66>:    jmp    0x18b63a5 <google::protobuf::io::CodedInputStream::ReadVarint32(unsigned int*)+21>

the invalid asm code after set breakpoint, because the invalid DWARF info lead to invalid address.

   0x00000000018b63c1 <+49>:    mov    $0xcc0001,%eax  # invalid address here
=> 0x00000000018b63c6 <+54>:    retq
   0x00000000018b63c7 <+55>:    int3
   0x00000000018b63c8 <+56>:    nop    %esp
   0x00000000018b63cb <+59>:    add    %al,(%rax)
   0x00000000018b63cd <+61>:    add    %al,(%rax)
   0x00000000018b63cf <+63>:    int3
   0x00000000018b63d0 <+64>:    int3
   0x00000000018b63d1 <+65>:    imul   %bl
   0x00000000018b63d3 <+67>:    ror    %esp

in the end, I use --include-pattern option to include the specify file, and it works fine.

I 'm sorry if I didn't make that clear !

SimonKagstrom commented 3 years ago

OK, good catch!

Typically that should be found with --verify, which disassembles the code and tries to ensure that breakpoints are only set on the instruction start, but the process isn't quite flawless.

Anyway, very good that you debugged the issue and found a workaround!