esm-tools / esm_runscripts

GNU General Public License v2.0
3 stars 5 forks source link

Release 5: Simplify file and directory management #32

Open seb-wahl opened 4 years ago

seb-wahl commented 4 years ago

I'd like to put the following for discussion (as it bugs me since I got to know the tools):

The file management and directory management should be simplified as tons of files are copied back and forth which makes (for those who didn't code the core parts (compute.py, jobclass.py, ...) very tricky to track down errors. In addition copying large (restart, forcing) files several times may significantly slow down job throughput. Having worked with the MPI-ESM runtime manager mkexp (python with Jinja2 style .config files) I find their file and directory management simpler and more efficient (while other things are horrible in mkexp); so here comes my suggestion:

  1. Upon start, esm_runscripts creates the the directory structure expid/restart/, expid/outdata/, expid/forcing ... like it is done at the moment.
  2. Copy/Link required forcing files for the current run into expid/forcing. On cold start optionally create a copy of esm_tools there as well.
  3. create a work folder expid/work/run_XXXX-YYYY/.
  4. copy/link all files (forcing, restart, namelists) required for the current run into expid/work/run_XXXX-YYYY/.
  5. cd expid/work/run_XXXX-YYYY/, sbatch .....
  6. Once done copy only the restart files into expid/restart/.
  7. trigger a subjob (like the post jobs at the moment) the does the cleanup (i.e. copying outdata, logs etc in place) of expid/work/run_XXXX-YYYY/ following the bullet-proof method used in mkexp (details later)
  8. increment date and go to 2.) and continue until run is done.

And last: Have all logs (model logs, esm_runscript logs, filelist, *finished.yaml, ...) in one place.

I know this against the current philosophy that everything related to the current run shall be in expid/run_XXXX-YYYY/ but it would certainly simplify the complete config dict and hence make error tracking easier.

dbarbi commented 3 years ago

I think with the changes we made we are sufficiently close to what you wanted, right?

pgierz commented 3 years ago

Going through some old issues to start cleaning up before the next release: is this still relevant? If not, @seb-wahl, please close or alternatively please respecify the problem that is happening so we can make a plan.