Closed czlhs closed 3 years ago
Well, I'm not aware of any case where kcov itself causes a crash in the covered program. However, it changes timing immensely, so if there are race conditions in a threaded program, it can be much more likely to "win" the race when run under kcov. Perhaps it's something like that in your program?
Maybe there is some problem due to program run in slow speed, such as request timeout, but I can't find one of them, I'll follow up on the problem.
I found out the reason cause the coredump. assume there are two compile units, bar.cc
and foo.cc
, they both have one inline function funcA
, bar.cc
was compiled by gcc5 , foo.cc
was compiled by gcc8, the funcA
's instruction and DWARF info in bar.o and foo.o are different.
but after link, there will be only one funcA
instruction, assume it's the funcA
in bar.o
. everything is right for bar.o
's DWARF info. but for foo.o
's DWARF, they point the invalid address, so if we set breakpoint on the address, coredump will happen.
here is an example, the normal asm code:
0x00000000018b63c1 <+49>: mov $0x1,%eax
0x00000000018b63c6 <+54>: retq
0x00000000018b63c7 <+55>: nopw 0x0(%rax,%rax,1)
0x00000000018b63d0 <+64>: xor %esi,%esi
0x00000000018b63d2 <+66>: jmp 0x18b63a5 <google::protobuf::io::CodedInputStream::ReadVarint32(unsigned int*)+21>
the invalid asm code after set breakpoint, because the invalid DWARF info lead to invalid address.
0x00000000018b63c1 <+49>: mov $0xcc0001,%eax # invalid address here
=> 0x00000000018b63c6 <+54>: retq
0x00000000018b63c7 <+55>: int3
0x00000000018b63c8 <+56>: nop %esp
0x00000000018b63cb <+59>: add %al,(%rax)
0x00000000018b63cd <+61>: add %al,(%rax)
0x00000000018b63cf <+63>: int3
0x00000000018b63d0 <+64>: int3
0x00000000018b63d1 <+65>: imul %bl
0x00000000018b63d3 <+67>: ror %esp
in the end, I use --include-pattern
option to include the specify file, and it works fine.
I 'm sorry if I didn't make that clear !
OK, good catch!
Typically that should be found with --verify, which disassembles the code and tries to ensure that breakpoints are only set on the instruction start, but the process isn't quite flawless.
Anyway, very good that you debugged the issue and found a workaround!
I have try the --verify flag, but it didn't work. My program receive SIGSEGV :
I use the gdb to debug the core file :
disassemble at address 0x0000000001eeaf10:
the code at the position is a
static thread_local
initialization, I run my program several times,It crash at different place but all the place are astatic thread_local
initialization. I don't know how to find the reason, Could someone give me some hint ?