Open hebasto opened 1 year ago
I've managed to track this down.
The warning box call holds g_timeoffset_mutex
(with msghand
thread) but waits on the main/gui
thread. Another thread holds cs_main
and waits on g_timeoffset_mutex
. The main/gui
thread then calls a scheduled polling event which waits for cs_main
. This prevents the user from releasing the g_timeoffset_mutex
, since they can no longer press "OK" on the warning box.
The warning box gets called from AddTimeData
in the msghand
thread while it holds the g_timeoffset_mutex
lock. The call is blocking, so it doesn't get released until the user presses "OK". That means it has to wait on the main/gui
thread.
While that's happening, the scheduler
thread locks cs_main
and eventually tries to lock g_timeoffset_mutex
(via GetAdjustedTime
):
It can't, so it waits.
Meanwhile, WalletModel::pollBalanceChanged
gets called on the main/gui
thread since it's on a scheduled timer. It checks to see if the confirmation numbers in the wallet need to be updated (this is why you need a non-empty wallet). Eventually, this line is called (similar to the now-outdated version in the OP):
In tryGetTxStatus
, the findBlock
method waits on cs_main
(which is held by the scheduler
thread, which is waiting on g_timeoffset_mutex
). Now we can't release the g_timeoffset_mutex
by pressing "OK" since the main/gui
thread is blocked.
The fundamental issue is that the uiInterface.ThreadSafeMessageBox
(and related ThreadSafeQuestion
, InitError
, and InitWarning
) calls are blocking but wait on the GUI thread. If ThreadSafeMessageBox
is called while any locks are held, and the main/gui
thread subsequently waits on those same locks, it'll freeze. Here's a dumb example -- add this to walletcontroller.cpp
:
void LoadWalletsActivity::load()
{
std::thread ([] {
LOCK(cs_main);
InitWarning(_("Everything is now frozen"));
}).detach();
But what's happening in this issue is a bit more insidious, as the main/gui
is waiting on cs_main
, yet only g_timeoffset_mutex
is directly locked by the ThreadSafeMessageBox
call.
I can't think of a simple but comprehensive fix. There are a few potential fixes to this specific case. For instance, in AddTimeData
, we could temporarily release the lock before calling ThreadSafeMessageBox
. Alternatively, we could call ThreadSafeMessageBox
in its own thread. Or have a non-blocking version that avoids using the deadlock-prone Qt::BlockingQueuedConnection
. I've tried all these and they do fix the issue, but I may be overlooking a better solution.
It looks like @ariard and @ryanofsky had a discussion somewhat related to this, so maybe can give better ideas than mine.
I vaguely remember finding a solution to this... I will try to find it - buried in a comment somewhere in bitcoin/bitcoin issues.
If I recall - the fix had to do with (re)casting the time back to int when passing it to the gui. The gui wasnt handling the chronos? variable correctly - it was likely a bug in the QT gui framework itself. I wasnt able to "prove" it because the crash didnt produce a stack error (for the gui) - but it worked. :)
I will try repost the fix with some tests to prove this...
I have encountered a deadlock in bitcoin-qt on Manjaro Linux IF you have a wrong system date (enough to trigger this message - "Please check that your computer's date and time are correct!") AND if your wallet is not empty. It seems it tries to read transactions in:
while displaying the warning message box and somehow deadlocks cs_nTimeOffset. The effect is that the message box pop-ups but you cannot click it and program does not react in any way.
Moved from bitcoin/bitcoin#15598.