deepmodeling / dpdispatcher

generate HPC scheduler systems jobs input scripts and submit these scripts to HPC systems and poke until they finish
https://docs.deepmodeling.com/projects/dpdispatcher/
GNU Lesser General Public License v3.0
42 stars 56 forks source link

fail to run task if remote_root path doesn't existed #436

Open link89 opened 7 months ago

link89 commented 7 months ago

When I running the following code with dflow

from dflow.plugins.dispatcher import DispatcherExecutor

kwargs = {'host': 'xxx', 'username': 'xxx', 'port': 6666, 'machine_dict': {'batch_type': 'Slurm', 'context_type': 'SSHContext', 'remote_profile': {
    'key_filename': '/home/xxx/.ssh/id_ed25519'}}, 'resources_dict': {'number_node': 1, 'cpu_per_node': 1}, 'queue_name': 'c52-small', 'remote_root': '/data/home/xxx/tmp/dflow-galaxy/square-sum'}
exeuctor = DispatcherExecutor(**kwargs)

If the remote_root path doesn't existed, it will raise error instead of create it automatically. I think this could be easily fixed by using mkdir -p or os.makedirs(path, exist_ok=True) somewhere in dpdispatcher.

njzjz commented 7 months ago

This seems to be an unsafe behavior, when one gives a wrong path. Considering one only needs to create a directory once, but needs to ensure the path is correct in each submission, I prefer throwing the error when the directory does not exist.

link89 commented 7 months ago

This seems to be an unsafe behavior, when one gives a wrong path.

It's always possible for users to choose a wrong path whether you create path for them or not. It just make users feel less convenient than improve security.

The reason I think root path should be created automatically is I will choose different remote directories for different projects and different runs. I don't want to login to a remote session just for creating a path. Besides of it, for workflows that are running across diffferent clusters it would be annoy to create the path manually on each of them.

njzjz commented 7 months ago

The reason I think root path should be created automatically is I will choose different remote directories for different projects and different runs.

This doesn't make sense. The submission runs in a temporary subdirectory of the root path instead of the root path itself, which will be entirely deleted after the submission is finished. The hash of the submission determines the subdirectory name, so the subdirectory will be different once any of the commands, the forward/backward file paths, or the local root is different.

Besides of it, for workflows that are running across diffferent clusters it would be annoy to create the path manually on each of them.

This is more dangerous, considering different machines may have different directory structures. If you don't log in to the cluster to check whether the directory exists, you may create directories in the wrong path.

link89 commented 7 months ago

The submission runs in a temporary subdirectory of the root path instead of the root path itself, which will be entirely deleted after the submission is finished.

I am aware of it. Then how about provide an option to enable this behavior?