Closed tcompa closed 2 years ago
Cross referencing issue #2628, #PR #2629 which deal with a different but related signal handling problem (with PR #2629 applying a similar fix as mentioned above - ensuring that the SIGTERM handler is not inherited from the parent process). This came from a situation where parsl was used inside Globus Compute, and the test suite there was (in other situations) installing a signal handler that was unexpectedly inherited by the interchange.
In general, the interchange should be expecting that the parent process could have set any signal handlers it wants, because it is the user's main process.
Context and brief summary
While working on fractal-server, we recently noticed an unexpected behavior: when running parsl (with HTEX executor) within a uvicorn web server, some part of the
DataFlowKernel.cleanup
method would shutdown our active server. We discovered the error is not related to parsl, but rather to the use of multiprocessing from within a uvicorn server. You can find here a minimal reproducible example below, and our fix based on a small change toparsl/executors/high_throughput/interchange.py
. Since this "bug" is not related to parsl, we are not asking for any change here. This issue is to document our findings/tests, and can be closed right away.This is mostly work done with @mfranzon, and the corresponding issue in our repo is here.
Minimal reproducible example
We setup a fresh python 3.8.13 environment with fastapi 0.85.0, uvicorn 0.18.3 and parsl 2022.10.10. We use the following
main.py
scriptwhich we run from the command line via
Now our web server is active, and any POST request will trigger one execution of
root()
. When we call this endpoint viathe uvicorn server receives some signal that shuts it down, and the logs (in the terminal) show
The server is not reachable any more, which is a critical problem for our use case (
fractal-server
should open and close several DFKs with HTEX executors, and obviously it should remain active).Source of the problem
We realized that the problem comes from using multiprocessing from within a uvicorn server. This is explained by @Mixser in
Quoting from those comments:
The reason why this appears in our example is that parsl HTEX executor does use a
ForkProcess
, in https://github.com/Parsl/parsl/blob/dfcbff8f633f33aa2aeca68ba5b13bfd66bc223a/parsl/executors/high_throughput/executor.py#L477 whereForkProcess
is simply https://github.com/Parsl/parsl/blob/dfcbff8f633f33aa2aeca68ba5b13bfd66bc223a/parsl/multiprocessing.py#L15Workaround (by patching parsl)
Based on the fastapi/uvicorn issues above, we modified parsl HTEX interchange by adding the lines
at the beginning of the
starter
function. This is visible in our fork: https://github.com/fractal-analytics-platform/parsl/blob/1.3.1-dev/parsl/executors/high_throughput/interchange.py. At the moment we didn't notice any side effect of this change in the use of parsl, but we cannot say whether this change will affect parsl in some way.Tests with recent parsl functionalities
As of https://github.com/Parsl/parsl/pull/2433, we can specify the
start_method
parameter for a HTEX. This concerns the workers, rather than the interchange, and does not affect our problem here. Just for completeness, we also tested our same example by adding thestart_method
argument inHighThroughputExecutor
: independently on our choice (fork
,spawn
,thread
), the problem remained there. This is not surprising, since that PR only modifies what takes place inprocess_worker_pool.py
.One could try by replacing
ForkProcess
withSpawnProcess
orThread
, in https://github.com/Parsl/parsl/blob/dfcbff8f633f33aa2aeca68ba5b13bfd66bc223a/parsl/executors/high_throughput/executor.py#L477 Preliminary tests led to frozen server (when usingSpawnProcess
) and to other compatibility issues when usingThread
(AttributeError: 'Thread' object has no attribute 'terminate'
) - and we did not investigate further.Summary
We found a way to use parsl/HTEX from within a uvicorn server, and by now we are using it in our parsl fork. If this issue is solved upstream (in uvicorn), we'll gladly switch back to parsl mainline. This issue is here just in case others find the same problem.