NOAA-GFDL / fre-cli

Python-based command line interface for FRE (FMS Runtime Environment) to compile and run FMS-based models and post-process their output.
GNU Lesser General Public License v3.0
3 stars 7 forks source link

Wrapper logic: fre pp status #176

Open cwhitlock-NOAA opened 1 week ago

cwhitlock-NOAA commented 1 week ago

Preamble: The thing-currently-named-wrapper (wrapper.py) works for testing fre-cli, when we have all history files already available at the time of running. However, when we get to running production jobs, the models will be sending over bundles of history files and post-processing them in parallel. This breaks some of the logic currently in the wrapper flow - in particular, the assumptions that there's not already a pre-existing experiment belonging to the same user that the current set of history files is being added to, and the assumption that there's not already an experiment with that name running.

The logic we need is encapsulated in a flowchart at the end of this issue; this breaks it down by tool.

The tool: Fre pp status needs some of the logic that we would normally apply to slurm jobs: what's your status? Are you done yet? The logic's not going to be that different from anything else that checks on a running job, but getting tests for artifically-stalled experiments is going to take mild effort. It may be possible to use some experiments kicked off by fre pp checkout or fre pp validate to test this section, though that makes the debugging worse.

fre pp status:

[ ] Has the job completed? [ ] Is the job running or stalled? [ ] If stalled: exit with error (we MIGHT be able to correct stalled jobs in the future)

image