fix: race condition when initializing multiprocessing manager

WinPlay02 commented 10 months ago

Closes #18

Summary of Changes

use spawn instead of fork to not deadlock when running tests

codecov[bot] commented 10 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Comparison is base (20e9aac) 100.00% compared to head (6b2d987) 100.00%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #26 +/- ## ========================================= Coverage 100.00% 100.00% ========================================= Files 9 9 Lines 334 337 +3 ========================================= + Hits 334 337 +3 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

github-actions[bot] commented 10 months ago

🦙 MegaLinter status: ✅ SUCCESS

Descriptor	Linter	Files	Fixed	Errors	Elapsed time
✅ PYTHON	black	3	0	0	0.56s
✅ PYTHON	mypy	3		0	1.62s
✅ PYTHON	ruff	3	0	0	0.02s
✅ REPOSITORY	git_diff	yes		no	0.02s

See detailed report in MegaLinter reports _Set VALIDATE_ALL_CODEBASE: true in mega-linter.yml to validate all sources, not only the diff_

_MegaLinter is graciously provided by _

lars-reimann commented 10 months ago

The process seems to get stuck inside wait_for_messages when running the test test_should_execute_pipeline_return_valid_placeholder.

WinPlay02 commented 10 months ago

That's true, but this does not seem to be the (main) problem. The main process gets stuck there because it doesn't receive any messages from the pipeline process.

This seems to be related to a race condition when initializing the message queue. The problem occurred because the fields were changed to cached properties. (https://docs.python.org/3.12/library/functools.html#functools.cached_property) While cached properties were locked in Python 3.11 to ensure they only get assigned once, they are no longer in Python 3.12.

Note:

Changed in version 3.12: Prior to Python 3.12, cached_property included an undocumented lock to ensure that in multi-threaded usage the getter function was guaranteed to run only once per instance. However, the lock was per-property, not per-instance, which could result in unacceptably high lock contention. In Python 3.12+ this locking is removed.

This caused the multiprocessing manager to sometimes be initialized twice (instead of only once), and the main process would get a different queue than the pipeline process.

Because no messages were sent to the correct queue, the main process waited indefinitely for results.

I confirmed this working under WSL2 with Python 3.12 (or I was incredibly lucky, and no problem has happened since I made the change).

So in the end, there was no deadlock (according to these debugging results), and the fork deprecation message turned out to be a red herring. Although, forking in a threaded program is still bad practice and should be avoided (and is also removed in this PR).

WinPlay02 commented 10 months ago

@SmiteDeluxe This should also fix the issue, that the extension sometimes isn't notified of the current pipeline execution progress

lars-reimann commented 10 months ago

:tada: This PR is included in version 0.4.0 :tada:

The release is available on:

v0.4.0
GitHub release

Your semantic-release bot :package::rocket:

Safe-DS / Runner