DeepDriveMD / DeepDriveMD-pipeline

DeepDriveMD: Deep-Learning Driven Adaptive Molecular Simulations
MIT License
15 stars 9 forks source link

FileNotFoundError for on-the-fly yaml files (when EnTK access schema is not 'local' e.g., PSC Bridges) #32

Open lee212 opened 3 years ago

lee212 commented 3 years ago

In general, EnTK launches a job remotely (using ssh) and intermediate data need to be stored on target resources e.g., Summit or Bridges. Input/output data are transferred by the data staging feature (with sftp) and the ddmd on-the-fly YAML files i.e., stage0000_task0000.yaml have to be on the remote as well, not on a client-side. However, the recent test on Bridges2 with the current version of DDMD throws the error message like:

FileNotFoundError: [Errno 2] No such file or directory: ‘/home/nsanjrani/test_sim_13/molecular_dynamics_runs/stage0000/task0000/stage0000_task0000.yaml’

We might want to consider this client-remote use case and discuss a possible solution to separate them. Note that the content of a yaml file also needs to be updated because it contains a local path, (which works fine if local and remote reside on the same location like Summit). FYI the example of the content looks like this:

(client machine)$ more /home/nsanjrani/test_sim_13/molecular_dynamics_runs/stage0000/task0000/stage0000_task0000.yaml
experiment_directory: /home/nsanjrani/test_sim_13
stage_idx: 0
task_idx: 0
output_path: /home/nsanjrani/test_sim_13/molecular_dynamics_runs/stage0000/task0000
node_local_path: null
pdb_file: /home/nsanjrani/.../experiment/bba/system/1FME-unfolded.pdb
initial_pdb_dir: /home/nsanjrani/.../experiment/bba
reference_pdb_file: /home/nsanjrani/.../experiment/bba/system/1FME-unfolded.pdb

I will start to add some findings and describe an idea.

braceal commented 3 years ago

Thanks for posting this issue, Hyungro. I think it would require changes to the EnTK entry point script. Perhaps a remote path option could be added which copies the input YAML to a remote experiment_directory and also sets up the remote directories using ssh.

For another project we have used Python's fabric module for this type of functionality.

Here is an example:

from fabric import Connection
conn = Connection(config.hostname)
conn.run(f"mkdir {config.experiment_directory}")
# Other conn.run() calls ... 

The intermediate YAML files might also have to be created on the machine running the entry point so they would have to be copied to remote as well. It might take a bit of work to implement this solution but in theory it is one approach.