PlatformLab / NanoLog

Nanolog is an extremely performant nanosecond scale logging system for C++ that exposes a simple printf-like API.
Other
3k stars 343 forks source link

gcc-8 incompatibility - One of the threads gets stuck #8

Closed eucristian closed 5 years ago

eucristian commented 5 years ago

Hello,

I might be doing something wrong but i have tried this on 3 different machines, 2 of them freshly installed.

Running the benchmark on a fresh installed Red Hat Enterprise Linux Server release 7.6 (Maipo) with devtoolset-8 and one of the threads gets stuck on RuntimeLogger::StagingBuffer::reserveSpaceInternal. Have the same problem also on self-compiled gcc 8.2 install with ABI=1. (bin-utils =the ones from the devtoolset-8)

I also think that one of the threads crashes as there are only 2 live threads. Below the gdb bt and thread info. This is also a bit peculiar as it doesn't always happen, but it happens consistently enough (one every 4 runs let's say)

(gdb) bt
#0  NanoLogInternal::RuntimeLogger::StagingBuffer::reserveSpaceInternal (this=0x7ffb358ba010, nbytes=nbytes@entry=16, blocking=blocking@entry=true) at ../runtime/RuntimeLogger.cc:760
#1  0x000000000040513f in NanoLogInternal::RuntimeLogger::StagingBuffer::reserveProducerSpace (nbytes=16, this=<optimized out>) at ../runtime/RuntimeLogger.h:330
#2  NanoLogInternal::RuntimeLogger::reserveAlloc (nbytes=16) at ../runtime/RuntimeLogger.h:109
#3  __syang0__fl__Simple32log32message32with32032parameters__Benchmark46cc__74__ (level=NanoLog::LogLevels::NOTICE, fmtStr=<synthetic pointer>) at generatedCode.h:11
#4  runBenchmark (id=0, barrier=<optimized out>) at Benchmark.cc:74
#5  0x0000000000404b2b in main (argc=<optimized out>, argv=<optimized out>) at Benchmark.cc:119

(gdb) info threads
  Id   Target Id                                     Frame
* 1    Thread 0x7ffb359fe780 (LWP 47837) "benchmark" NanoLogInternal::RuntimeLogger::StagingBuffer::reserveSpaceInternal (this=0x7ffb358ba010, nbytes=nbytes@entry=16, blocking=blocking@entry=true) at ../runtime/RuntimeLogger.cc:760
  2    Thread 0x7ffb2c7d8700 (LWP 47840) "benchmark" 0x00007ffb34bb3d12 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0

Thank you very much. will also keep an eye to provide more info if needed.

syang0 commented 5 years ago

Hello eucristian,

I don't have access to a RHEL machine, but I was able to reproduce the problem in CentOS and Debian. It appears that there's some bad interaction between gcc-8 and POSIX AIO. In particular, the thread that you see exit under gdb is the POSIX AIO thread, not one of the NanoLog threads. I'm not sure what's causing the bad interaction, but the problem seems to go away when I use gcc-7 and earlier.

Could you try compiling with an earlier version of gcc and see if the problem occurs? Internally, we develop on gcc 6.4.0.

eucristian commented 5 years ago

Unfortunately we need to use gcc-8 . I have looked on their website and could not find a bug regarding POSIX AIO. I have also tested with gcc-7 and it seems that the problem is gone but this is not somehting i can use in the production environment.

syang0 commented 5 years ago

I think I've found the root cause. It appears gcc8 is optimizing out a variable read within the StagingBuffer class, causing it to misreport free space. I'll try to have a fix up in the next day or so.

Also, it appears that the POSIX AIO thread was exiting due to inactivity through the aio_* api and was an indicator of inactivity with my background thread rather than gcc8+aio incompatibility. It was a red herring; sorry about that.

syang0 commented 5 years ago

This problem should be fixed in commit e969124. Give it a try and let me know if you encounter any more issues.

eucristian commented 5 years ago

Hello, will do this tonight and come back with my results. Thank you for this!