Matgenix / jobflow-remote

jobflow-remote is a Python package to run jobflow workflows on remote resources.
https://matgenix.github.io/jobflow-remote/
Other
25 stars 11 forks source link

Optionally delay download after job end #210

Open gpetretto opened 4 hours ago

gpetretto commented 4 hours ago

From a discussion in qtoolkit https://github.com/Matgenix/qtoolkit/pull/43 it emerged that it may be convenient to delay the dowload step after job completion in case of slow NFS on the worker. This should be possible by adding a delay_download option for the worker and in that case set the retry_time_limit when a job goes in the TERMINATED state.

ml-evs commented 3 hours ago

If I can't see anything obvious causing the issues at #160, I'll try to hack this feature in as a first draft -- though I'd be surprised if that is the cause of my current woes.

In general this feature would be useful given the peculiarities of parallel file systems. It's probably better to err on the side of waiting for a very long time for downloading files that we expect to appear even if the job errored, i.e., the jfremote outputs.