Open trathi05 opened 3 years ago
I am getting similar behavior to @trathi05 with some code using Distributed
. I call Julia with julia -p 4 --project=. my_script.jl
, some functions run independently in parallel using pmap
with 4 workers and everything behaves as I expect (including the correct outputs). But when the script terminates I get the following warnings:
┌ Warning: Forcibly interrupting busy workers
│ exception = rmprocs: pids [3] not terminated after 5.0 seconds.
└ @ Distributed ~/julia/usr/share/julia/stdlib/v1.9/Distributed/src/cluster.jl:1253
┌ Warning: rmprocs: process 1 not removed
└ @ Distributed ~/julia/usr/share/julia/stdlib/v1.9/Distributed/src/cluster.jl:1049
One guess (without real evidence) is that once all of my tasks have been assigned, at least one of the workers that is no longer needed is still active for some reason while the other workers finish their tasks, and so it needs to be forcibly terminated because it is "hanging" (for lack of a more precise term/understanding of what might be happening).
Another guess is that process 1
refers to the host process, and for some reason it is not shutting down properly. I'm not sure if that is even possible, since I assume the host process is the one sending the warnings. Potentially relevant here could be that I am calling pmap
from inside a function that is defined in the script.
Since it doesn't seem to be causing a problem with my code execution or performance, it is not a big deal at all. However, it is a concerning-looking warning nonetheless, especially if others use my code down the line.
I am running julia 1.9.0-DEV (2022-05-04, Commit 862018b20d) on Mac OS Montery with Apple Silicon.
My code that adds processes using
addprocs
and subsequently performs parallelization usingpmap
sometimes terminates with the following warning. This doesn't affect my output of the code in any way, but this warning shows up in the end, esp. with scripts that run for significant amount of time (over an hour at least).I am not sure if this is a machine related issue or it has something to do with the
Distributed
package.