CESNET / netopeer2

NETCONF toolset
BSD 3-Clause "New" or "Revised" License
296 stars 187 forks source link

tests: `SIGEV_THREAD` doesn't work under TSAN #1420

Closed jktjkt closed 1 year ago

jktjkt commented 1 year ago

As per the docs, the SIGEV_THREAD sigevent option instructs the kernel to create a thread in this process when that timer expires (in fact this is done by the C library, and the kernel "just" wakes up that thread AFAICT). However, this newly created thread appears to have completely bypassed TSAN, and when the just-created thread hits any TSAN interceptor which assumes that some per-thread TSAN structures have been already set up, the application segfaults (for example like this):

 #0  0x00005555556192d4 in __tsan::user_alloc_internal(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool) ()
 #1  0x00005555556196a4 in __tsan::user_alloc(__tsan::ThreadState*, unsigned long, unsigned long) ()
 #2  0x00005555555d3db5 in malloc ()
 #3  0x00007ffff7889470 in timer_helper_thread (arg=<optimized out>) at ../sysdeps/unix/sysv/linux/timer_routines.c:88
 #4  0x00007ffff787ddd4 in start_thread (arg=<optimized out>) at pthread_create.c:444
 #5  0x00007ffff78ff9b0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

That's with the latest clang, or, in another case, with GCC:

 #0  0x00007ffff4e88fc6 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__tsan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__tsan::AP64> >*, unsigned long, unsigned long) () from /nix/store/ds46sf6mb9xydn0gy97131y1w5hag5s8-gcc-12.2.0-lib/lib/libtsan.so.2
 #1  0x00007ffff4e8686a in __tsan::user_alloc_internal(__tsan::ThreadState*, unsigned long, unsigned long, unsigned long, bool) ()
    from /nix/store/ds46sf6mb9xydn0gy97131y1w5hag5s8-gcc-12.2.0-lib/lib/libtsan.so.2
 #2  0x00007ffff4e869d8 in __tsan::user_alloc(__tsan::ThreadState*, unsigned long, unsigned long) () from /nix/store/ds46sf6mb9xydn0gy97131y1w5hag5s8-gcc-12.2.0-lib/lib/libtsan.so.2
 #3  0x00007ffff4e4331e in malloc () from /nix/store/ds46sf6mb9xydn0gy97131y1w5hag5s8-gcc-12.2.0-lib/lib/libtsan.so.2
 #4  0x00007ffff4cab4c0 in timer_helper_thread (arg=<optimized out>) at ../sysdeps/unix/sysv/linux/timer_routines.c:88
 #5  0x00007ffff4c9fe24 in start_thread (arg=<optimized out>) at pthread_create.c:444
 #6  0x00007ffff4d219b0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

I have no clue on how to fix this properly within TSAN.

Bug: https://github.com/google/sanitizers/issues/1612

michalvasko commented 1 year ago

How come you have discovered it only now? Also, adding an option just cause a diagnostic tool is faulty does not seem right, I have always been against over-reliance on these tools. But I suppose I would let this slide. Perhaps the C flags could be examined for the TSAN flag instead of the option.

jktjkt commented 1 year ago

How come you have discovered it only now?

Recently TSAN changed their internal implementation from runtime v2 to runtime v3. Or it could be a glibc difference, I don't know.

Also, adding an option just cause a diagnostic tool is faulty does not seem right, I have always been against over-reliance on these tools.

We've reported quite a few real bugs with these, both in the libyang/sysrepo/libnetconf2/netopeer2 stack, in external libraries and in our code. I would be very unhappy if we lost access to a tool that is very useful.

But I suppose I would let this slide. Perhaps the C flags could be examined for the TSAN flag instead of the option.

Sure, I can do it that way if that's what you prefer. I was thinking that explicit is better than implicit here, but it's your call.

michalvasko commented 1 year ago

Sure, I can do it that way if that's what you prefer. I was thinking that explicit is better than implicit here, but it's your call.

Please do, there is enough options as it is.

jktjkt commented 1 year ago

Sure, I can do it that way if that's what you prefer. I was thinking that explicit is better than implicit here, but it's your call.

Please do, there is enough options as it is.

Done.