bitcoin-core / gui

Bitcoin Core GUI staging repository
https://github.com/bitcoin/bitcoin
MIT License
572 stars 257 forks source link

deadlock with wrong system date/time and non-empty wallet #643

Open hebasto opened 1 year ago

hebasto commented 1 year ago

I have encountered a deadlock in bitcoin-qt on Manjaro Linux IF you have a wrong system date (enough to trigger this message - "Please check that your computer's date and time are correct!") AND if your wallet is not empty. It seems it tries to read transactions in:

TransactionRecord *index(interfaces::Wallet& wallet, int idx)
{
...
            if (wallet.tryGetTxStatus(rec->hash, wtx, numBlocks, adjustedTime) && rec->statusUpdateNeeded(numBlocks)) {
                rec->updateStatus(wtx, numBlocks, adjustedTime);
}

while displaying the warning message box and somehow deadlocks cs_nTimeOffset. The effect is that the message box pop-ups but you cannot click it and program does not react in any way.

Moved from bitcoin/bitcoin#15598.

john-moffett commented 1 year ago

I've managed to track this down.

Summary:

The warning box call holds g_timeoffset_mutex (with msghand thread) but waits on the main/gui thread. Another thread holds cs_main and waits on g_timeoffset_mutex. The main/gui thread then calls a scheduled polling event which waits for cs_main. This prevents the user from releasing the g_timeoffset_mutex, since they can no longer press "OK" on the warning box.

Details:

The warning box gets called from AddTimeData in the msghand thread while it holds the g_timeoffset_mutex lock. The call is blocking, so it doesn't get released until the user presses "OK". That means it has to wait on the main/gui thread.

While that's happening, the scheduler thread locks cs_main and eventually tries to lock g_timeoffset_mutex (via GetAdjustedTime):

https://github.com/bitcoin-core/gui/blob/678889e6c6231cf461de59eefe6fb8eb07468848/src/net_processing.cpp#L1239-L1242

It can't, so it waits.

Meanwhile, WalletModel::pollBalanceChanged gets called on the main/gui thread since it's on a scheduled timer. It checks to see if the confirmation numbers in the wallet need to be updated (this is why you need a non-empty wallet). Eventually, this line is called (similar to the now-outdated version in the OP):

https://github.com/bitcoin-core/gui/blob/678889e6c6231cf461de59eefe6fb8eb07468848/src/qt/transactiontablemodel.cpp#L228

In tryGetTxStatus, the findBlock method waits on cs_main (which is held by the scheduler thread, which is waiting on g_timeoffset_mutex). Now we can't release the g_timeoffset_mutex by pressing "OK" since the main/gui thread is blocked.

Root Issue:

The fundamental issue is that the uiInterface.ThreadSafeMessageBox (and related ThreadSafeQuestion, InitError, and InitWarning) calls are blocking but wait on the GUI thread. If ThreadSafeMessageBox is called while any locks are held, and the main/gui thread subsequently waits on those same locks, it'll freeze. Here's a dumb example -- add this to walletcontroller.cpp:

void LoadWalletsActivity::load()
{
    std::thread ([] {
        LOCK(cs_main);
        InitWarning(_("Everything is now frozen"));
    }).detach();

But what's happening in this issue is a bit more insidious, as the main/gui is waiting on cs_main, yet only g_timeoffset_mutex is directly locked by the ThreadSafeMessageBox call.

Potential fixes:

I can't think of a simple but comprehensive fix. There are a few potential fixes to this specific case. For instance, in AddTimeData, we could temporarily release the lock before calling ThreadSafeMessageBox. Alternatively, we could call ThreadSafeMessageBox in its own thread. Or have a non-blocking version that avoids using the deadlock-prone Qt::BlockingQueuedConnection. I've tried all these and they do fix the issue, but I may be overlooking a better solution.

It looks like @ariard and @ryanofsky had a discussion somewhat related to this, so maybe can give better ideas than mine.

RandyMcMillan commented 1 year ago

I vaguely remember finding a solution to this... I will try to find it - buried in a comment somewhere in bitcoin/bitcoin issues.

If I recall - the fix had to do with (re)casting the time back to int when passing it to the gui. The gui wasnt handling the chronos? variable correctly - it was likely a bug in the QT gui framework itself. I wasnt able to "prove" it because the crash didnt produce a stack error (for the gui) - but it worked. :)

I will try repost the fix with some tests to prove this...