Mudlet / Mudlet

⚔️ A cross-platform, open source, and super fast MUD client with scripting in Lua
https://mudlet.org
GNU General Public License v2.0
738 stars 268 forks source link

Mudlet freezes when closing secure profile with self-signed certificates on Linux #2288

Closed vadi2 closed 5 years ago

vadi2 commented 5 years ago

Brief summary of issue / Description of requested feature:

Mudlet freezes when closing secure profile with self-signed certificates on Linux.

Steps to reproduce the issue / Reasons for adding feature:

  1. Connect to moo.neon-moo.space:7778 and enable Accept self-signed certificates in order for the connection to work
  2. Open and connect on another, non-secure profile (say achaea.com:23)
  3. Close Neon MOO profile
  4. Mudlet will freeze on mPoolReadWriteLock.lockForRead(); in HostManager::getHost(QString hostname)
1   futex_wait_cancelable                                                                                                                            futex-internal.h    88   0x7ffff42169f3 
2   __pthread_cond_wait_common                                                                                                                       pthread_cond_wait.c 502  0x7ffff42169f3 
3   __pthread_cond_wait                                                                                                                              pthread_cond_wait.c 655  0x7ffff42169f3 
4   QWaitCondition::wait(QMutex *, QDeadlineTimer)                                                                                                                            0x7ffff4761033 
5   QWaitCondition::wait(QMutex *, unsigned long)                                                                                                                             0x7ffff4761362 
6   QReadWriteLock::tryLockForRead(int)                                                                                                                                       0x7ffff475b994 
7   QReadWriteLock::lockForRead()                                                                                                                                             0x7ffff475baca 
8   HostManager::getHost                                                                                                                             HostManager.cpp     175  0x5555558b9464 
9   mudlet::slot_newDataOnHost                                                                                                                       mudlet.cpp          4109 0x5555558fc10c 
10  QtPrivate::FunctorCall<QtPrivate::IndexesList<0, 1>, QtPrivate::List<QString const&, bool>, void, void (mudlet:: *)(QString const&, bool)>::call qobjectdefs_impl.h  152  0x5555559a062b 
11  QtPrivate::FunctionPointer<void (mudlet:: *)(QString const&, bool)>::call<QtPrivate::List<QString const&, bool>, void>                           qobjectdefs_impl.h  185  0x55555599f498 
12  QtPrivate::QSlotObject<void (mudlet:: *)(QString const&, bool), QtPrivate::List<QString const&, bool>, void>::impl                               qobjectdefs_impl.h  414  0x55555599ddf6 
13  QMetaObject::activate(QObject *, int, int, void * *)                                                                                                                      0x7ffff494d956 
14  TConsole::signal_newDataAlert                                                                                                                    moc_TConsole.cpp    179  0x555555ac7d5b 
15  TConsole::printOnDisplay                                                                                                                         TConsole.cpp        1237 0x555555990d1f 
16  cTelnet::postData                                                                                                                                ctelnet.cpp         1949 0x55555572dceb 
17  cTelnet::handle_socket_signal_disconnected                                                                                                       ctelnet.cpp         441  0x555555721e0b 
18  QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, void (cTelnet:: *)()>::call(void (cTelnet:: *)(), cTelnet *, void * *) qobjectdefs_impl.h  152  0x555555734b17 
19  QtPrivate::FunctionPointer<void (cTelnet:: *)()>::call<QtPrivate::List<>, void>(void (cTelnet:: *)(), cTelnet *, void * *)                       qobjectdefs_impl.h  185  0x5555557343f5 
20  QtPrivate::QSlotObject<void (cTelnet:: *)(), QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void * *, bool *)      qobjectdefs_impl.h  414  0x55555573362c 
... <More>                                                                                                                                                                                   

Error output / Expected result of feature

Mudlet should not freeze.

Extra information, such as Mudlet version, operating system and ideas for how to solve / implement:

Mudlet 3.17.0 on Linux.

vadi2 commented 5 years ago

@SlySven handing this over to you as you're across this locking code and how it works, could you have a look?

SlySven commented 5 years ago

Oh, that problem is because something - in this case the slot cTelnet::handle_socket_signal_disconnected has been called (asynchronously) from the Qt (network?) library code and has (indirectly) called the slot mudlet::slot_newDataOnHost which needs to get the calling Host instance given that it does know the name it is using. This is a problem because it is trying to look it up (by using (Host*) getHost(const QString& hostname) to read (QMap<QString, QSharedPointer<Host>>) HostManager::mHostPool {thereby taking a shared read-lock on that) at the same time as another thread (probably our main Mudlet application one) is changing that QMap and that has a (higher priority, exclusive) write-lock on it.

This is probably the process that is closing the Neon MOO - and I guess the contending thread is probably one associated with the Achaea profile (either our main Mudlet one or one created by another part of the Qt libraries) - that being the case we need to where that is changing HostManager::mHostPool and try to shorten the time it holds the write lock...

BTW In reviewing the code that might be causing this I see that (void) mudlet::slot_close_profile_requested(int) is used to close a profile when multi-playing and the close button is used on the tab-bar; however (void) mudlet::slot_close_profile() is never used - I am guessing that closure of a profile otherwise (when only one user profile is open) is handled by other means - I haven't yet pinned down where that happens but I will - even if it is just to clarify things for myself...

SlySven commented 5 years ago

Oh, that is not nice - that mPoolReadWriteLock.lockForRead(); in HostManager::getHost(QString hostname) is being hit whilst the same profile has already got a write lock in HostManager::deleteHost(const QString& hostname) - there is around 32 stack-frames in between:

screenshot_20190219_050750

This is going to take a good think...

SlySven commented 5 years ago

Okay found a workaround for that point, but then got another one from the Host::raiseEvent()... call - more thinking in progress... screenshot_20190220_155352

SlySven commented 5 years ago

I think the problem is partly cause by a gratuitous use of mTelnet.disconnect(); in the Host destructor - it is far too late to be doing that there as it triggers signals and slots - such as cTelnet::handle_socket_signal_disconnected() and cTelnet::signal_disconnected(Host*) which is a really bad idea by then... :boom:

I am looking at the profile closing code and I can see some clean up steps (including the elimination of a goto in TConsole::closeEvent(QCloseEvent*)) that can be done.

vadi2 commented 5 years ago

Just be careful and test that sysDisconnectionEvent is still raised in the same behaviour and Lua code doesn't crash when calling Mudlet API (because some things like Host are already destroyed) (https://github.com/Mudlet/Mudlet/issues/512)

SlySven commented 5 years ago

Just to confirm but it is not surprising. FreeBSD locks up in exactly the same manner in the same place; though. given the code structure, I'd expect this to be on all platforms.

I think I can fix this. I just need to establish a uniform sequence of steps in the closing of one profile/more profiles/the whole application and ensure that the slot cTelnet::handle_socket_signal_disconnected() or the badly named cTelnet::disconnect()+ are not called asynchronously by the signal QAbstractSocket::disconnected() too late in the profile closing process to be handled correctly.

As it happens, putting a mTelnet.disconnect(); in the Hostdestructor does not seem like a good idea in this context!

+ - badly named because Ì think it erroneously overloads(?) the non-static QObject::disconnect() overload with all three default argument which disconnects all slots connected to the object concerned - the cTelnet instance in this case.