globus / globus-compute

Globus Compute: High Performance Function Serving for Science
https://www.globus.org/compute
Apache License 2.0
143 stars 47 forks source link

Shell Function: run_dir #1646

Open RINO-GAELICO opened 2 weeks ago

RINO-GAELICO commented 2 weeks ago

In relation to the new feature of Shell Function , it might be useful to add an additional argument for the user to specify the run_dir. Right now it uses a sandbox or the working task dir.

Describe the solution you'd like A possible solution could be to pass the run_dir as one the optional arguments in the Shell constructor. Then checking if the run_dir is not None and let the user choice take priority over sandboxing.

run_dir = self.run_dir

yadudoc commented 2 weeks ago

Hi @RINO-GAELICO, thanks for taking this time to write up this issue. This was one of the options in consideration while ShellFunctions were in development and was ultimately removed. Here is the rationale:

  1. Ideally we want one component in control of setting the task's run dir. Since the endpoint already manages this based on user-supplied configs this would be the ideal place.
  2. If the user wants to set a task dir on a per-task basis, it's still quite easy to do without support in the ShellFunction:

ShellFunction("mkdir -p {rundir}; cd {rundir}; ...", rundir=/path/to/run/dir)



Let me know what you think. If we get more folks asking for similar functionality, we are definitely open to supporting it.
RINO-GAELICO commented 2 weeks ago

Hi @yadudoc thanks for your reply.

Regarding your points:

  1. I understand the logic, but from another perspective, it would become extremely tedious to change the endpoint configuration every time you want to modify the run_dir. For example, consider a piece of code that needs to run multiple times (for example in a loop), with only the run_dir changing. In this particular case, it wouldn't be ideal to manually update the configuration each time.

  2. From my understanding, this approach wouldn't work without modifying the ShellFunction constructor to accept additional **kwargs and using cmd.format(**kwargs). Am I wrong?

yadudoc commented 2 weeks ago

Hi @RINO-GAELICO,

  1. If the user wants to set working_dir on a per-task basis, I completely agree that going via the configuration route is going to be tedious and very inefficient. We were working with the assumption that most users probably would want to set up the task sandbox directory, which would automatically isolate tasks. It sounds like you want more control than that.
  2. My example was buggy, here's a more complete example, that I've tested out. You do not pass the kwargs to ShellFunction at init but rather to executor.submit and the kwargs are used to format the cmd string :
>>> from globus_compute_sdk import Executor, ShellFunction
>>> ex = Executor(endpoint_id="a1e677f9-c08a-4b70-8b70-4a51b13128cd")  # This is my local EP, please update before testing
>>> future = ex.submit(ShellFunction("pwd; cd {run_dir}; pwd"), run_dir="/tmp")
>>> future.result().stdout
'/Users/yadu/.globus_compute/prod/tasks_working_dir\n/tmp\n'
>>> print(future.result().stdout)
/Users/yadu/.globus_compute/prod/tasks_working_dir
/tmp

Thanks again for your care in evaluating these options.

rjmello commented 2 weeks ago

Also, in a multi-user endpoint context, working_dir can be set as a user configuration template variable, which the user can modify on every submit call.