Parsl / parsl

Parsl - a Python parallel scripting library
http://parsl-project.org
Apache License 2.0
504 stars 195 forks source link

Automatically activate Python virtual environment on compute nodes #3541

Open giordano opened 3 months ago

giordano commented 3 months ago

Is your feature request related to a problem? Please describe.

When you submit a pipeline with Parsl from a login node, to be executed on a compute node, in the worker_init attribute of the executor provider you likely need to activate the Python virtual environment in which Parsl needs to run. But this is not very user friendly, as the user just had to activate the environment to launch the pipeline in the first place, so this is asking the user to repeat twice the same operation. I was told on Slack there have been some thoughts about automating this in the past, but not real work.

Describe the solution you'd like

Ideally Parsl would automate the Python environment activation. The challenge is that answering the questions "what's the current environment? How do I activate it?" is very complicated in the general case, because it'll also depend on the package manager used.

Describe alternatives you've considered

The alternative is doing nothing, and let the user figure it out, but I find this situation a bit frustrating, the reason why I'd like to use Parsl is to simplify things, not to complicate them.

Additional context

N/A

giordano commented 3 months ago

I'm not sure this would work and I haven't played with it, but I was wondering if replacing

# hope the python environment was activated
CMD() {
process_worker_pool.py ...
}

with something like

# no need to activate the environment
CMD() {
$(python found with sys.executable) $(path to script found with shutil.which("process_worker_pool.py"))...
}

would be a viable option

benclifford commented 3 months ago

@giordano i've wondered about that before. in some situations it will work but not all of them - I think it would be easy enough to start using it yourself in your own setups as a hack in your own code, and see how that works.

Alternatively, process_worker_pool.py is executable and contains a hard-coded reference to the relevant Python, so remove the whole python prefix from that and run it as if its just some other unix executable...?

giordano commented 3 months ago

I tried

diff --git a/parsl/executors/high_throughput/executor.py b/parsl/executors/high_throughput/executor.py
index 6918336..48bff4b 100644
--- a/parsl/executors/high_throughput/executor.py
+++ b/parsl/executors/high_throughput/executor.py
@@ -1,7 +1,9 @@
 import logging
 import math
 import pickle
+import shutil
 import subprocess
+import sys
 import threading
 import typing
 import warnings
@@ -36,7 +38,8 @@ from parsl.utils import RepresentationMixin

 logger = logging.getLogger(__name__)

-DEFAULT_LAUNCH_CMD = ("process_worker_pool.py {debug} {max_workers_per_node} "
+DEFAULT_LAUNCH_CMD = (f"{sys.executable} "
+                      f"{shutil.which('process_worker_pool.py')} {{debug}} {{max_workers_per_node}} "
                       "-a {addresses} "
                       "-p {prefetch_capacity} "
                       "-c {cores_per_worker} "

as a quick and dirty hack (I didn't want to change too much code just for testing this solution) and it seems to work for my use case: I can run the pipeline without necessarily activating the environment. Even if this doesn't solve all problems, can something like this be a decent option?

benclifford commented 3 months ago

Tagging @rjmello because he's working on something related, in the context of Globus Compute

I think we need to be careful about the principles behind this rather than adding complexity without understanding why changes work.

The patch @giordano has above makes this change:

It still uses $PATH to discover process worker pool.

What are the situations where this improves things?

Why would process worker pool be running with the wrong Python command? Why would sys.executable not be the Python that is already hard-coded at the top of process_worker_pool by pip install? (for example, is it because the wrong process worker pool has been found on the path? in which case shutil.which('process_worker_pool.py') still runs the wrong process worker pool.

I guess @giordano has an environment in which this improves things so it would be interesting to understand these questions in the context of that environment.

giordano commented 1 month ago

I guess @giordano has an environment in which this improves things so it would be interesting to understand these questions in the context of that environment.

I simply created a local venv environment on the login node of a cluster and wanted to run the parsl pipeline on the compute node (using a HighThroughputExecutor, see https://github.com/Parsl/parsl/issues/3542#issue-2429819496), but process_worker_pool.py won't be in PATH on the compute node if I don't manually activate with worker_init the environment I ran the parsl pipeline from. I find that having to activate the same exact environment twice on both the login node (to start the pipeline) and on the compute node (to actually run it) is error-prone and not very user friendly.