geoschem / GCHP

The "superproject" wrapper repository for GCHP, the high-performance instance of the GEOS-Chem chemical-transport model.
https://gchp.readthedocs.io
Other
22 stars 25 forks source link

PET.ESMF_LogFiles with errors produced at end of successful run #304

Closed Twize closed 4 months ago

Twize commented 1 year ago

Name and Institution (Required)

Name: Tyler Wizenberg Institution: University of Toronto

Confirm you have reviewed the following documentation

Description of your issue or question

Hi, I just completed a 1-year segment GCHP v14.1.1 simulation (split into two 6-month blocks), and at the end of each run, the model produced many PETXXX.ESMF_LogFile (where XXX is the physical core #, so 600 total) files in the run directory. Each contains the following errors:

20230317 203158.968 ERROR PET000 ESMF_Clock.F90:887 ESMF_ClockGetAlarm() Failure - Internal subroutine call returned Error 20230318 070014.992 INFO PET000 Finalizing ESMF 20230318 070142.750 ERROR PET000 ESMF_Clock.F90:887 ESMF_ClockGetAlarm() Failure - Internal subroutine call returned Error 20230318 174408.815 INFO PET000 Finalizing ESMF

The times on the errors look to be from the start and end of the run. However, no errors showed up in the actual run logs (slurm output or gchp.log), nor were there any errors in allPEs.log. The output files and restart were all produced successfully, and cap_restart was updated with the new start time. From a surface-level perspective, it does not look like the run was negatively impacted.

Should I be concerned or is this relatively benign?

Relevant files

Example PET.ESMF_LogFile PET000.ESMF_LogFile.zip gchp.log: gchp.20020701_0000z.log Slurm output: slurm_output_9070488.txt allPEs.log: allPEs.log

Software versions

GCHP: v14.1.1 Compiler: GCC 10.3.0 OpenMPI: 4.1.1 Netcdf-Fortran: 4.6.0 ESMF: 8.3.1

lizziel commented 1 year ago

Hi @Twize, this is nothing to be concerned about. It seems to have to do with the monthly diagnostic setting and it is on my to do list to further investigate. If the files are bothersome you can disable them by commenting this line in file src/GCHPctm.F90 and rebuilding: cap_options%esmf_logging_mode = ESMF_LOGKIND_MULTI_ON_ERROR

lizziel commented 1 year ago

I will leave this issue open as a reminder to investigate the problem further.

Twize commented 1 year ago

Hi @lizziel, thanks for the quick reply, and I am glad that it's nothing concerned about. They aren't too bothersome, I can just delete them after the run completes, so its not too big of an issue!

lizziel commented 1 year ago

I have not been able to readily reproduce this issue so am putting off introducing a fix until 14.3, or a Z-version of 14.2.

Twize commented 1 year ago

Hi @lizziel, I can provide my GCHP input files for the run which is giving these errors if you think it would help in reproducing the issue.

lizziel commented 1 year ago

Sure, thanks!

Twize commented 1 year ago

Hi @lizziel, sorry for the delay. Attached (as a .zip) are the main input files used for my run which produces these errors. Please let me know if any important ones are missing!

GCHP_input_files_for_Lizzie.zip

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If there are no updates within 7 days it will be closed. You can add the "never stale" tag to prevent the Stale bot from closing this issue.

lizziel commented 12 months ago

@atrayano is looking into if this is also an issue in GEOS.

lizziel commented 4 months ago

We now suppress these files by default starting in version 14.2.1 (see https://github.com/geoschem/GCHP/pull/330). I am closing this issue since (1) it has been passed to MAPL developers as a MAPL History bug, and (2) it does not impact GCHP runs beyond creating lots of log files if run directory config file ESMF.rc is changed to output ESMF error logs.