Closed TheQuantumPhysicist closed 3 years ago
I tried many things in mpmc_blocking_queue
:
None of that helped. Let me also add that thread sanitizer was screaming the whole time with many different issues. I thought these are false positives, but seems like there's a big synchronization issue there.
How this can be reproduced ? I never encountered this before. Also, the link you provided points to master branch which is obsolete. Please use v1.x branch
I am using the latest version from releases, not master. I just used master to link things here. As you can see in the repo I linked of neblio (the program that uses spdlog), I included the latest spdlog release version and I'm using it header only.
How to produce this? It's really difficult, because it happens only in a certain environment (my public server, and we have a few customers who complained about this). I may try to isolate everything in a minimal program, but the problem is that I'm traveling in two days, so I may not be available in full capacity.
The good news is that logging is isolated in neblio into one file (DefaultLogger.h) + the initialization (InitLogging function).
So, how this happens in neblio, is that it first runs the InitLogging function, then immediately, the first logging call after that freezes everything. I use Debian Buster, 10.8, and this is the compiler information: gcc (Debian 8.3.0-6) 8.3.0
.
One important thing to keep in mind is that thread sanitizer showed many issues even before I faced this, and I think they all should be tackled. Today was the first day I looked into spdlog's code... besides me leaving soon, I don't believe I have the experience to solve all these issues.
If you want to attempt to reproduce the issue, clone the neblio repo, build with cmake, then run nebliod
. Check the log file produced. If it stopped after
[2021-03-16 20:09:30.532] [info] [init.cpp:194] [void InitLogging()]: Initialized logging successfully!
and nothing happens, then the issue has happened.
Again, in my workstation, it works fine. In my public server, this happens. Please let me know if you need more info.
One important thing to keep in mind is that thread sanitizer showed many issues even before I faced this,
It is a known issue that TSan produces false positives with cpp11 (e.g. https://github.com/gabime/spdlog/pull/789 and https://stackoverflow.com/questions/37552866/why-does-threadsanitizer-report-a-race-with-this-lock-free-example)
Regarding the problem, you could try to replace std::thread::hardware_concurrency()
with 1
and see if it helps (long shot, but worth a try).
Otherwise, If there is no way to reproduce I will close this issue.
There seems to be a deadlock bug in spdlog that happens randomly on some architectures. On my workstation (Ryzen 3900x), it's consistently fine. But on my public root server (AMD Opteron Processor 3365), sometimes it blocks and sometimes it doesn't. I attached gdb to the running process and it seems like it's blocking at the thread-pool queuing function. The following is the stack trace when it's blocked:
The code that does the call (the abstraction layer of logging) is this (in case I'm doing something wrong, though the usage is very simplistic): https://github.com/NeblioTeam/neblio/blob/VIUTests/wallet/logging/defaultlogger.h
This is the branch where I'm calling this code. So you can see exactly what's happening.
If you need any additional info, please let me know.