Open lee212 opened 3 years ago
Thanks for posting this issue, Hyungro. I think it would require changes to the EnTK entry point script. Perhaps a remote path option could be added which copies the input YAML to a remote experiment_directory and also sets up the remote directories using ssh.
For another project we have used Python's fabric module for this type of functionality.
Here is an example:
from fabric import Connection
conn = Connection(config.hostname)
conn.run(f"mkdir {config.experiment_directory}")
# Other conn.run() calls ...
The intermediate YAML files might also have to be created on the machine running the entry point so they would have to be copied to remote as well. It might take a bit of work to implement this solution but in theory it is one approach.
In general, EnTK launches a job remotely (using ssh) and intermediate data need to be stored on target resources e.g., Summit or Bridges. Input/output data are transferred by the data staging feature (with sftp) and the ddmd on-the-fly YAML files i.e.,
stage0000_task0000.yaml
have to be on the remote as well, not on a client-side. However, the recent test on Bridges2 with the current version of DDMD throws the error message like:We might want to consider this client-remote use case and discuss a possible solution to separate them. Note that the content of a yaml file also needs to be updated because it contains a local path, (which works fine if local and remote reside on the same location like Summit). FYI the example of the content looks like this:
I will start to add some findings and describe an idea.