esm-tools / esm_tools

Simple Infrastructure for Earth System Simulations
https://esm-tools.github.io/
GNU General Public License v2.0
25 stars 12 forks source link

esm_tools output in std out. #1220

Open JanStreffing opened 1 month ago

JanStreffing commented 1 month ago

For some time now I find, that esm_tools output is written into stdout at model startup. To me this makes debugging model crashes harder, as unrelated monitoring info is interspersed. I am under the impression, that this output used to go somewhere else.

Here is an example:

  0:       file   : namelist.dyn open ok
  0:       file   : namelist.tra open ok
  0:  number of tracers will be changed from          100  to            2 !
  0:  total number of tracers is:            2
  0:  ____________________________________________________________
  0:   --> initialise ALE arrays + sparse SSH stiff matrix
  0:
  0:  ____________________________________________________________
  0:   --> initialise ssh operator using unperturbed ocean depth
  0:       > in stiff_mat_ale, reading /work/ab0246/a270092/input/fesom2//core2/dist_128/rpart.out
Started from: /work/ab0246/a270092/runtime/awicm3-v3.3/no_crash_16/scripts/
starting (jobtype): observe_compute
{'check': False, 'profile': None, 'update': False, 'update_filetypes': [], 'expid': 'no_crash_16', 'launcher_pid': '931519', 'current_date': '20000101', 'run_number': '1', 'jobtype': 'observe_compute', 'last_jobtype': 'prepcompute', 'verbose': True, 'inspect': None, 'use_venv': False, 'no_motd': False, 'ignore_config_warnings': False, 'original_command': 'awicm3-v3.3-levante-TCO95L91-CORE2.yaml -e no_crash_16 -t observe_compute -p 931519 -s 20000101 -r 1 -v --last-jobtype prepcompute --open-run', 'started_from': '/work/ab0246/a270092/runtime/awicm3-v3.3/no_crash_16/scripts/', 'scriptname': 'awicm3-v3.3-levante-TCO95L91-CORE2.yaml', 'runscript_abspath': '/work/ab0246/a270092/runtime/awicm3-v3.3/no_crash_16/scripts/awicm3-v3.3-levante-TCO95L91-CORE2.yaml'}
  0:       took:    6.2958537000000092E-002  sec
  0:
  0:  Arrays are set
  0:  FCT is initialized
  0:            2  tracers will be used in FESOM
  0:  tracer IDs are:            1           2
  0:  read Temperature climatology from:phc3.0_winter.nc
  0:  read Salinity    climatology from:phc3.0_winter.nc
mandresm commented 1 month ago

It is indeed weird, some time ago I coded a stream for observe that will get the output from observe compute, and only write it when the simulation is finished. Somehow, that's broken now?

Thanks for reporting.

JanStreffing commented 1 month ago

This has been like this for quite a while for me. Maybe half a year or more. Does that fit timeline wise?

mandresm commented 1 month ago

I don't know, other than the workflow manager, I don't know of another thing it could have changed that. Do you see that for all versions of FESOM right?

JanStreffing commented 1 month ago

Not just FESOM, here is an example from a standlone oifs 48r1 simulation:

scale_max5_oifs_compute_20010101-20010131_12752740.log

pgierz commented 1 month ago

I can confirm this, and I think I also know where it is happening. We migrated logging to the loguru library a while ago, if I am not mistaken, and all logs are now dumped into sys.stdout:

https://github.com/esm-tools/esm_tools/blob/16d133daaeefee5151ae02df5b490ce3471d7cbd/src/esm_runscripts/cli.py#L283C10-L283C65

We need to reconsider this, in particular I would in general like to overhaul the logging to make it cleaner between all the various tools. I am not sure when @mandresm or I will have time for that, though.

mandresm commented 1 month ago

Good point @pgierz! The good news is that it's probably really easy to solve :)

Before using loguru we were using print commands and the standard logger, and we had a problem: the observe step runs at the same time the models are running, and printing/logging at the same time of the models. This was a mess. I solved the problem a while ago by creating https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/logfiles.py

That took care of writing all the stdout/err of observe into: 1) a stream 2) on the fly into the observe log, and 3) dumping the stream into the stdout of the script when the simulation was finished.

My feeling is that we can remove https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/logfiles.py, but we will need to configure loguru so that for observe (https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/observe.py) it can write both to the observe log and into a stream that is then put into the model stdout.

@pgierz, can you take this issue under your wing? If that's the case please have a look at https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/logfiles.py, maybe you get some ideas from there into how to implement a similar behavior with loguru. Then we can get rid of the logfiles.py.

mandresm commented 1 month ago

the entry point for all of this is in the SimulationSetup.__call__:

https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/sim_objects.py#L124

There one calls initialize_logfiles function which uses the RuntimeLogger, the one that takes care of handling the stream:

https://github.com/esm-tools/esm_tools/blob/16d133daaeefee5151ae02df5b490ce3471d7cbd/src/esm_runscripts/logfiles.py#L34C19-L34C32

The RuntimeLogger object is used again to dump it's stream into the general log file:

https://github.com/esm-tools/esm_tools/blob/release/src/esm_runscripts/sim_objects.py#L156

pgierz commented 1 month ago

I'll look into this...