Ulm-IQO / qudi-core

A framework for modular measurement applications.
GNU General Public License v3.0
38 stars 21 forks source link

[Bug] Exceptions when calling functions in different threads are not correctly logged nor printed to console. #95

Open TobiasSpohn opened 6 months ago

TobiasSpohn commented 6 months ago

Version

Development

What is affected by the bug?

Any module that uses QtCore.Qt.QueuedConnection to call a method from any other thread (module).

When does the bug occur?

When using Qt's QueuedConnection to call a method of a different thread. If the called method raises an exception that exception is never logged nor printed to console. This can be very confusing when testing new modules as no exceptions are thrown when some call fails.

How do we replicate the issue?

I provided a minimal working example in branch no_exception_logging

  1. Checkout branch no_exception_logging
  2. Start Qudi and load config file located at src/exception.cfg
  3. In IPython console call problem_logic.connect_signal(1) problem_logic.emit_signal() or problem_logic.connect_signal(3) problem_logic.emit_signal()
  4. No exception is printed in the console nor in the logger

Expected behavior

The logger should log the exception Exception in exception_method. Maybe even the traceback for easier identification where the exception occured.

If calling problem_logic.connect_signal(2) problem_logic.emit_signal() or problem_logic.connect_signal(4) problem_logic.emit_signal() the exception is correctly logged.

Relevant log output

Calling all 4 signal slots after each other.
2024-05-06 15:24:32 debug qudi.logic.problem_logic.ProblemLogic Emitting signal.
2024-05-06 15:24:32 debug qudi.logic.exception_logic.ExceptionLogic In exception_method.

2024-05-06 15:24:46 debug qudi.logic.problem_logic.ProblemLogic Emitting signal.
2024-05-06 15:24:46 debug qudi.logic.exception_logic.ExceptionLogic In exception_method.
2024-05-06 15:24:46 error qudi.logic.exception_logic.ExceptionLogic Exception in exception_method

2024-05-06 15:24:54 debug qudi.logic.problem_logic.ProblemLogic Emitting signal.
2024-05-06 15:24:54 debug qudi.logic.problem_logic.ProblemLogic Testing exception_method.
2024-05-06 15:24:54 debug qudi.logic.exception_logic.ExceptionLogic In exception_method.

2024-05-06 15:25:43 debug qudi.logic.problem_logic.ProblemLogic Emitting signal.
2024-05-06 15:25:43 debug qudi.logic.problem_logic.ProblemLogic Testing exception_method by wrapping in try,except statement.
2024-05-06 15:25:43 debug qudi.logic.exception_logic.ExceptionLogic In exception_method.
2024-05-06 15:25:43 error qudi.logic.problem_logic.ProblemLogic Exception in exception_method
2024-05-06 15:25:43 debug qudi.logic.problem_logic.ProblemLogic Code after exception throw is reached and exception is logged.

Additional Comments

The resolution to this problem is to wrap the whole method body of the called method into a try, except statement. This is done by using ExceptionLogic.try_except_method as slot method. Or assign a slot method that calls the function in the other thread within a try, except statement. This is done by utilizing ProblemLogic.test_try_except_exception_method as slot method.

Maybe it makes sense to write a wrapper in some Base class that does this automatically.

Furthermore, is it possible to not only log the exception message Exception in exception_method but also the full traceback to simplify the search for what caused the exception?

Contact Details

No response

Neverhorst commented 6 months ago

Well as long as we are not providing "manual" logging by wrapping everything in try-except, this is a problem in PySide2. The PySide2 excepthook does not register exceptions raised in threaded slots.

Regarding your examples... the proper way to "manually" log exceptions in qudi would be:

try:
    ...
except:
    self.log.exception('Something went wrong')
    raise

This will log the full exception traceback and re-raise any exception without alteration.

TobiasSpohn commented 6 months ago

Yes, this is the better way to log the exception. Thanks, I changed the files.

Do you think it is possible and does it make sense to modify the base class so that every method does this wrapping by default? Then at least all exceptions will be logged so the user can deal with it and the exceptions not just "vanish".

Neverhorst commented 6 months ago

Possible, yes. But it comes with quite a large overhead that impacts performance of almost anything in qudi. Each call to a logic method and property will require a check if the caller is the main thread and only wrap the call in try-except if this is not the case.

Before we commit to such drastic workarounds, I would like to check first if this issue has been resolved in PySide6.

TobiasSpohn commented 2 months ago

Hey, I finally got time to test it with PySide6. I used the pyside6-dev branch and did some additional refactoring to getting it working properly. I then merged it with branch no_exception_logging and tested it with the above instructions.

Now, all signal emissions throw an exception, which are correclty logged. So indeed, PySide6 fixes this problem. I also did some initial refactoring work in iqo-modules you can check it out in branch pyside6. With both pyside6 branches, qudi can be installed with python 3.12.5, removing the <3.11 version restriction. Loading all modules from the iqo-modules default.cfg and deactivating them again causes no Problems on Linux with Kernel 6.10.9 and Wayland. Only reloading them without restarting Qudi causes this error:

Exception during activation:
Traceback (most recent call last):
  File "/home/tobi/qudi/qudi-core/src/qudi/core/module.py", line 301, in __activation_callback
    self.on_activate()
  File "/home/tobi/qudi/qudi-iqo-modules/src/qudi/gui/pulsed/pulsed_maingui.py", line 206, in on_activate
    self._activate_generator_settings_ui()
  File "/home/tobi/qudi/qudi-iqo-modules/src/qudi/gui/pulsed/pulsed_maingui.py", line 1020, in _activate_generator_settings_ui
    self.pulse_generator_settings_updated(self.pulsedmasterlogic().pulse_generator_settings)
  File "/home/tobi/qudi/qudi-iqo-modules/src/qudi/gui/pulsed/pulsed_maingui.py", line 1461, in pulse_generator_settings_updated
    widget.blockSignals(True)
RuntimeError: Internal C++ object (PySide6.QtWidgets.QComboBox) already deleted.

I had a similar error when reworking the optimize logic and it was just due to some bad assert checking before calling functions from QtWidgets. Therefore, I don't think it originates from the transition to PySide6. To check this, I tried it with the current main of both projects.

However, doing the same in the current mains of both projects causes this issue during deactivation of the taskrunner gui or logic:

RuntimeError: Failed to disconnect signal sigStateChanged(PyObject).
Traceback (most recent call last):

  File "/home/tobi/qudi/qudi-core/src/qudi/core/modulemanager.py", line 629, in deactivate
    self.__poll_timer.stop()

AttributeError: 'NoneType' object has no attribute 'stop'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/home/tobi/qudi/qudi-core/src/qudi/gui/taskrunner/task_runner_gui.py", line 75, in _deactivate_self
    self._qudi_main.module_manager.deactivate_module(self._meta['name'])

  File "/home/tobi/qudi/qudi-core/src/qudi/core/modulemanager.py", line 222, in deactivate_module
    self._modules[module_name].deactivate()

  File "/home/tobi/qudi/qudi-core/src/qudi/core/modulemanager.py", line 632, in deactivate
    self._instance.module_state.sigStateChanged.disconnect(self._state_change_callback)

Reloading all modules without restarting causes Qudi to completely crash when trying to load POI Manager. The last message I get before crashing:

Couldn't restore roi from dict, defaulting to empty roi: Parameter to generate ScanImageMeta instance from must be of type dict.
Fatal Python error: Segmentation fault

Current thread 0x00007fc0fcf7d580 (most recent call first):
  File "/home/tobi/qudi/qudi-iqo-modules/src/qudi/gui/poimanager/poimanagergui.py", line 451 in __connect_control_signals_to_logic
  File "/home/tobi/qudi/qudi-iqo-modules/src/qudi/gui/poimanager/poimanagergui.py", line 327 in on_activate
  File "/home/tobi/qudi/qudi-core/src/qudi/core/module.py", line 301 in __activation_callback
  File "/home/tobi/qudi/.qudi-env3.10/lib64/python3.10/site-packages/fysom/__init__.py", line 99 in _callback
  File "/home/tobi/qudi/.qudi-env3.10/lib64/python3.10/site-packages/fysom/__init__.py", line 333 in _before_event
  File "/home/tobi/qudi/.qudi-env3.10/lib64/python3.10/site-packages/fysom/__init__.py", line 298 in fn
  File "/home/tobi/qudi/qudi-core/src/qudi/core/modulemanager.py", line 556 in activate
  File "/home/tobi/qudi/qudi-core/src/qudi/core/modulemanager.py", line 248 in start_all_modules
  File "/home/tobi/qudi/qudi-core/src/qudi/core/application.py", line 335 in run
  File "/home/tobi/qudi/qudi-core/src/qudi/core/__main__.py", line 52 in <module>
  File "/usr/lib64/python3.10/runpy.py", line 86 in _run_code
  File "/usr/lib64/python3.10/runpy.py", line 196 in _run_module_as_main

The issues are the same on PySide2 for Windows.

However, trying to run the pyside6 branches on Windows causes a different error when starting the rpyc server on qudi launch.

 Error during start of RPyC Server "remote-modules-server":

OSError: [WinError 10048] Only one usage of each socket address (protocol/network address/port) is normally permitted
Traceback (most recent call last):

  File "C:\Users\Tobias Spohn\qudi\qudi-core\src\qudi\core\servers.py", line 111, in run
    self.server = rpyc.ThreadedServer(self.service,
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "C:\Users\Tobias Spohn\qudi\.qudi-env3.12\Lib\site-packages\rpyc\utils\server.py", line 91, in __init__
    self.listener.bind(address)

I will look into this to fix it for Windows as well.

Generally I think switching to PySide6 is beneficial, especially due to the fix for throwing exceptions at all times and being able to use higher python versions that support better exceptions in general. The refactoring felt so much easier by just getting easier to grasp exceptions and I think this will be helpful for all future development efforts.

TobiasSpohn commented 2 months ago

Ok, PySide6 branches work on Windows as well. The error above is caused by an instance of qudi somehow not correctly shutting down and still using the same rpyc address. After using TaskManager this problem was resolved. I forgot to install the kernel and it was then using rpyc 5.3.1 server trying to connect with rpyc 6.0.0 to it which causes this error. But by running the install kernel script again this is also resolved.

Now on startup I only have a Qt Warning

The requested buffer size is too big, ignoring.

I don't know, where it originates from. However it is thrown whenever a GUI module is loaded, so this is thrown quite often when loading all modules. Sometimes The requested filter is too big, ignoring is thrown in there as well. However, I can now do load all, deactivate all and load all again and qudi will not crash but throw the above mentioned QComboBox exception.