Open RINO-GAELICO opened 2 weeks ago
Hi @RINO-GAELICO, thanks for taking this time to write up this issue. This was one of the options in consideration while ShellFunctions
were in development and was ultimately removed. Here is the rationale:
ShellFunction
:
ShellFunction("mkdir -p {rundir}; cd {rundir}; ...", rundir=/path/to/run/dir)
Let me know what you think. If we get more folks asking for similar functionality, we are definitely open to supporting it.
Hi @yadudoc thanks for your reply.
Regarding your points:
I understand the logic, but from another perspective, it would become extremely tedious to change the endpoint configuration every time you want to modify the run_dir
. For example, consider a piece of code that needs to run multiple times (for example in a loop), with only the run_dir
changing. In this particular case, it wouldn't be ideal to manually update the configuration each time.
From my understanding, this approach wouldn't work without modifying the ShellFunction constructor to accept additional **kwargs
and using cmd.format(**kwargs)
. Am I wrong?
Hi @RINO-GAELICO,
working_dir
on a per-task basis, I completely agree that going via the configuration route is going to be tedious and very inefficient. We were working with the assumption that most users probably would want to set up the task sandbox directory, which would automatically isolate tasks. It sounds like you want more control than that.executor.submit
and the kwargs are used to format the cmd
string : >>> from globus_compute_sdk import Executor, ShellFunction
>>> ex = Executor(endpoint_id="a1e677f9-c08a-4b70-8b70-4a51b13128cd") # This is my local EP, please update before testing
>>> future = ex.submit(ShellFunction("pwd; cd {run_dir}; pwd"), run_dir="/tmp")
>>> future.result().stdout
'/Users/yadu/.globus_compute/prod/tasks_working_dir\n/tmp\n'
>>> print(future.result().stdout)
/Users/yadu/.globus_compute/prod/tasks_working_dir
/tmp
Thanks again for your care in evaluating these options.
Also, in a multi-user endpoint context, working_dir
can be set as a user configuration template variable, which the user can modify on every submit call.
In relation to the new feature of Shell Function , it might be useful to add an additional argument for the user to specify the run_dir. Right now it uses a sandbox or the working task dir.
Describe the solution you'd like A possible solution could be to pass the run_dir as one the optional arguments in the Shell constructor. Then checking if the run_dir is
not None
and let the user choice take priority over sandboxing.run_dir = self.run_dir