GEOS-ESM / MAPL

MAPL is a foundation layer of the GEOS architecture, whose original purpose is to supplement the Earth System Modeling Framework (ESMF)
https://geos-esm.github.io/MAPL/
Apache License 2.0
26 stars 17 forks source link

Feature request: always report model throughput after finalize #1930

Closed lizziel closed 1 year ago

lizziel commented 1 year ago

Reporting the model throughput to log at the end of the run is currently tied to pFlogger and requires MAPL logger level INFO (see here).

It would be useful if it was always output regardless of logging settings. Currently both the throughput estimation per timestep and the model timing info are always printed and are not tied to pFlogger. End-of-run model throughput reporting would fall into the same category of information as these, and always doing it would not impact performance or clutter the log.

The code has a comment that it is going to be removed soon. We use this in GCHP as a handy performance metric so request that it stay. It also seems to be much more accurate than the throughput reported per timestep. For example, my 6-hr run has the following stats. If you do the math with the total run time the correct estimation is 48 days/day.

2019/07/01  Time: 06:00:00 Throughput(days/day)[Avg Tot Run]:         63.3         73.4        219.6
   TOTAL                      449.3732       0.0000     449.3732       0
           MAPL: INFO: Model Throughput:       48.008 days per day
mathomp4 commented 1 year ago

@lizziel Huh. You are right. There is a comment from William that it will be removed. I'm not sure why. I depend on it quite a bit (of course, I wrote it so, well... 😄).

Obviously as you say, the reason it is more accurate overall is that the during-run timers never see the Finalize actions (and perhaps any History writing still going on). When I wrote that, I used the brain dead method of just wall time divided by model time. Nothing fancy.

Frankly, I think that comment probably should go away or maybe change. Perhaps @tclune had a thought to revamp how we calculate or report that a few years ago?

I guess the other question for @tclune is is there a "logger" that always exists? That is, even if a user doesn't provide a logging.yaml, is there still a way to use pFlogger to write it "always"?

tclune commented 1 year ago

Even if we revamp, it is clear the comment is wrong. People now rely on it, so it cannot be deleted.

The whole point of pflogger is to be able to control whether some outputs happen and also to control where they go. I want the timing profile to go through pflogger as in theory we really should want that in a separate file rather than hidden at the bottom of the console log. (Can be both, of course.)

Isn't INFO the default, which means this in "always on" unless you don't want it to be? Could be I've not thought it through.

mathomp4 commented 1 year ago

I just checked and, yes, if you don't pass in a logging.yaml file, then you still get the print.

But maybe what @lizziel is asking is what if you build MAPL without pFlogger support? Then I suppose the print is never done since it is going through logger?

Maybe there is there a way to "if pflogger". So we could do:

if (pflogger) then
   lgr => logging%get_logger('MAPL')
   call lgr%info("Model Throughput: %f12.3 days per day", model_days_per_day)
else
   write (*,*) "Model Throughput: %f12.3 days per day", model_days_per_day
end if

(with the right format statement of course). I suppose we could do it with #ifdef BUILD_WITH_PFLOGGER...

tclune commented 1 year ago

I'm pretty sure GCHP uses pFlogger - we've added several diagnostics in ExtData using that layer.

And having a switch on top of pFlogger is the way of insanity.

mathomp4 commented 1 year ago

And having a switch on top of pFlogger is the way of insanity.

Indeed. Since it looks like the print happens no matter what, I'll make a PR to at least remove the comment as we will want this.

lizziel commented 1 year ago

Yes, we do use pFlogger but recently changed the default setting for MAPL to be WARNINGS rather than INFO. However, we still want model throughput printed. I'll revisit the reasoning behind changing INFO to WARNINGS. I think we were trying to avoid certain prints that were showing up in the log.

tclune commented 1 year ago

You can leave the default settings for MAPL to be INFO, and we can spin off a sublogger for profiling. Then you can set the reporting to be INFO for that but WARNING for everything else.

I'll have to look at the code, but I thought I was already using a sublogger for profiling. But maybe not.

tclune commented 1 year ago

For the routine small throughput bursts, this line should change:

https://github.com/GEOS-ESM/MAPL/blob/5d0c6ec1cc044cdec5f588ae120c0d43ebab29c6/gridcomps/Cap/MAPL_Cap.F90#L339

Should instead be

lgr => logging%get_logger('MAPL.profile') 

And then the level for logging the profile can be set independently.

Things are a bit more complicated for the profiles of individual components. I see now that I never finished the other request to push those writes to pFlogger:

https://github.com/GEOS-ESM/MAPL/blob/5d0c6ec1cc044cdec5f588ae120c0d43ebab29c6/generic/MAPL_Generic.F90#L2327-L2334

This could at least check the appropriate logger to see if its level is above some threshold. Baby step. The problem is should it check MAPL.profile or should it check <comp-name>.profile. I see merits in both. The former is easier to be global, but the latter is really the "right way". (TM).

tclune commented 1 year ago

And the same issue for the global timers:
https://github.com/GEOS-ESM/MAPL/blob/5d0c6ec1cc044cdec5f588ae120c0d43ebab29c6/base/ApplicationSupport.F90#L187-L193

But at least there it is clearer that it should be MAPL.profile

lizziel commented 1 year ago

A profiling sublogger would be great. This reminds me, if there are changes to the loggers what is the best way for me to get that update? We have a logging.yml file that sits in the GEOS-Chem repository rather than be extracted from MAPL. Are the MAPL release notes the best place to get notified that GCHP's logging.yml should be updated for new features?

tclune commented 1 year ago

@lizziel Well pFlogger itself is very stable. (Or at least mostly neglected at this time.) So the only reason to update the YAML file would be to change your defaults or add new sections because someone has instrumented a new GEOS component. AFAIK, there is no current activity on that front either. But if there were, you could monitor our main YAML file: https://github.com/GEOS-ESM/GEOSgcm_App/blob/develop/logging.yaml (Has not changed since March 2022)

With MAPL3 there will probably be a bit more thought given to how to arrange MAPL specific loggers.

In the mean time, hopefully we can quickly provide the changes you have requested in this ticket.

lizziel commented 1 year ago

Thank you!

mathomp4 commented 1 year ago

I'm taking a look at this. First, I'm just working on the "Model Throughput" timer and if I take the suggestion of @tclune for a MAPL.profiler it does work. If we use our (GEOS') current logging.yaml which has:

   MAPL:
       handlers: [mpi_shared]
       level: WARNING
       root_level: INFO

   MAPL.PROF:
       handlers: [console_plain]
       propagate: FALSE
       level: WARNING
       root_level: INFO

we get out:

       profiler: INFO: Model Throughput:     1030.440 days per day

because it's just inheriting the MAPL logger setup.

but if we change to:

   MAPL.profiler:
       handlers: [console_plain]
       propagate: FALSE
       level: WARNING
       root_level: INFO

where we edit to use the old MAPL.PROF setup, we get the console_plain output:

Model Throughput:      968.064 days per day

which I kind of like. I'll try and figure out the timers now...

mathomp4 commented 1 year ago

@lizziel Can you look at https://github.com/GEOS-ESM/MAPL/pull/1938

I think if you update your logging.yml to have a new handler:

   console_plain:
      class: streamhandler
      formatter: plain
      unit: OUTPUT_UNIT
      level: INFO

and a new logger:

   MAPL.profiler:
       handlers: [console_plain]
       propagate: FALSE
       level: WARNING
       root_level: INFO

you should be able to do WARNING for all of MAPL, but set this at info for the profiler output

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days, it will be closed. You can add the "long term" tag to prevent the Stale bot from closing this issue.

lizziel commented 1 year ago

Thanks @mathomp4!