ACCESS-NRI / access-esm1.5-configs

Standard ACCESS-ESM1.5 configurations released and supported by ACCESS-NRI
Creative Commons Attribution 4.0 International
0 stars 1 forks source link

Large CICE log files being archived #32

Open MartinDix opened 1 month ago

MartinDix commented 1 month ago

From an historical run, one year of archived CICE output has 480 MB of monthly data, 900 MB of daily data and 900 MB of pretty useless log files

gadi-cpu-bdw-0004:/scratch/p...8.save/output000/ice% du -h .
1.4G    ./HISTORY
2.3G   
gadi-cpu-bdw-0004:/scratch/p...8.save/output000/ice% ls -l
total 886424
-rwxrwxrwx 1 mrd599 p66      4741 Jul  4 14:25 cice_in.nml
-rw-r----- 1 mrd599 p66  12098797 Jul  4 15:42 debug.root.03
drwxr-s--- 2 mrd599 p66      4096 Jul  4 15:42 HISTORY
-rw-r----- 1 mrd599 p66  22025936 Jul  4 15:42 ice_diag.d
-rw-r----- 1 mrd599 p66 202781860 Jul  4 15:42 ice_diag_out
-rw-r----- 1 mrd599 p66  57990544 Jul  4 15:42 iceout085
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout086
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout087
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout088
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout089
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout090
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout091
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout092
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout093
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout094
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout095
-rw-r----- 1 mrd599 p66  55700942 Jul  4 15:42 iceout096
-rw-r----- 1 mrd599 p66       734 Jul  4 14:25 input_ice.nml

iceout files only have messages about coupling fields and aren't worth saving. The ice_diag files are much larger than in CM2 runs. Is the CICE log level set too high?

MartinDix commented 1 month ago

CICE is hardcoded to write lots of coupling related messages to the log files, e.g.

https://github.com/ACCESS-NRI/cice4/blob/e7549ebd2044690a432cc67c1317c81cb194b750/drivers/access-cm/cpl_interface.F90#L965

Same in CICE5 as used by CM2.

MartinDix commented 1 month ago

ice_diag.out has lots of warning messages like

           3 :          28         230 ITD: hicen(n) > hbnew(n)
 cat            1
           3 :          28         230 hicen(n) =   346.351803569291
           3 :          28         230 hbnew(n) =   346.272013025422

from https://github.com/ACCESS-NRI/cice4/blob/e7549ebd2044690a432cc67c1317c81cb194b750/source/ice_therm_itd.F90#L324-L334

Do these show a real problem? They occur throughout the run, not just at the start.

aidanheerdegen commented 1 month ago

CICE is hardcoded to write lots of coupling related messages to the log files

These look like debugging messages that shouldn't be in production code. Can we fence them with some logic with a debugging variable?

It looks like this is done on an ad-hoc basis in some of the code files

https://github.com/ACCESS-NRI/cice4/blob/e7549ebd2044690a432cc67c1317c81cb194b750/source/ice_distribution.F90#L1137

https://github.com/ACCESS-NRI/cice4/blob/e7549ebd2044690a432cc67c1317c81cb194b750/source/ice_spacecurve.F90#L110

This isn't a great way to do it, as it requires a change to the code to turn on and off. It does have the advantage that a compiler will generally completely remove if statements like this when optimisation is turned on.

A better approach would probably be preprocessor directives, which still require compilation, but the code itself doesn't change.

Even better would be making this a user-settable variable in a namelist, but that requires a lot more work.

@anton-seaice is this done better or at all in the later CICE versions?

anton-seaice commented 1 month ago

There are namelist options debug_model and debug_forcing in CICE6. It looks like there are #ifdef DEBUG statements scattered in the COSIMA CICE5 as well, but this is probably different to the CM2 build.

Its roughly a weeks work to make these namelist configurable by the time it gets tested etc. A bit less time to do it as an #ifdef / preprocessor flag.

blimlim commented 1 month ago

A (hopefully) quick change we could make would be to remove all but the first iceoutXYZ logs in the payu archive step, similar to what it does with the UM logs: https://github.com/payu-org/payu/blob/89d70cf74655b6870505340b51124e7836284f9b/payu/models/um.py#L62-L67

This would save some space though doesn't address all the above issues.

anton-seaice commented 1 month ago

Thanks for Siobhan for point out these are a real warning.

CICE warning is

230 ITD: hicen(n) > hbnew(n)

Which is trying to say that the calculated ice thickness is thicker than the calculated bounds for the thickness class. CICE will not 'remap' from one thickness category to the next, and instead 'rebin'.

See this note:

  !-----------------------------------------------------------------
  ! Initialize remapping flag.
  ! Remapping is done wherever remap_flag = .true.
  ! In rare cases the category boundaries may shift too far for the
  !  remapping algorithm to work, and remap_flag is set to .false.
  ! In these cases the simpler 'rebin' subroutine will shift ice
  !  between categories if needed.
  !----------------------------------

Note however that the normal diagnostics for thickness appear ok:

e.g.

istep1:       312    idate:   1010114    sec:         0
                                             Arctic                 Antarctic
...
max ice volume     (m) =        5.54726629281101502      13.58701983998852114
...

All the warnings I looked at also have a related warning:

0 : 25 301 ITD hbnew(n) > hin_max(n+1) or 0 : 13 51 ITD: hbnew(n) < hin_max(n-1)

hicen is just vicen/aicen (volume/concentration) and should be in metres, so the values here in the thousands are crazy ... implying very large vicen (volume) or very small aicen (concentration) in these thickness categories in these cells (or vice versa, depending on the error) .

There seem to be a few points that repeatedly have this error, although only within one thickness category for each cell!?. It doesn't turn into obvious errors in the monthly averages. To investigate further we would probably have to run with output saved everytime step ?

( As an aside ... we are using kcatbound=0 to set the boundaries between thickness classes, the default is now kcatbound=1 which gives rounder numbers )

There are several points which show excessive growth of congelation ice even in the monthly averages:

Screenshot 2024-08-06 at 4 24 53 PM

The points all appear to be at the ice-edge - so maybe its some indicator of ice appearing and dissapearing ?? Or ice growing at low concentrations? Its hard to confirm if the warnings are at the same locations because it needs mapping from the PE x/y to the global x/y.

ofa001 commented 1 month ago

Hi @anton-seaice Thanks for the congel ice plot, I looked into this late last week, getting back to this afternoon, It is thick ice at very low concentrations, the cases were along the East Antarctic coastline, but i can believe the ones at the ice edge as well, they are the ones that upset the congel statistics, in ACCESS 1-0/3 ESM1-5, where we have no 'puny' flag on what data is included in the history files, and we also had a flag in ACCESS-CM2 that looked at aice the previous time step, but CICE6 didnt use in the end, but its clear the high ice growth is coming when it is switching on and off growth every time step in some out of the way points and create spikey data in the eventual history files. Not sure how we trap these points early in the run, they may be round off from the coupler across different grid boxes or genuine.

Back to the coastal point errors, that are throwing warnings yet to look at time step data but even in monthly they show up as odd, with the UM atmosphere data thinking the SST in the grid box is at freezing for the month whilst the CICE and MOM data have the SST well above and large bottom fluxes hence why the ice switches from melt/to freezing conditions seems to last from Jan-March by April it seems to melt out correctly in those points, when they are slightly cooler :(. 1.5 m air temp and surface ice/snow temp are around -1 to -4C at most along that coastline.

ofa001 commented 1 month ago

Hi @anton-seaice @MartinDix I think have solved it with some thinking (and walking ) time, its coming from the coupler and the level of resolution we have use oasis scrip routine so is round off error in the ice concentrations, that is why we are seeing it at the coast and ice edge. The Met office used a higher order interpolation than we did and did not see these spikey points at the ice edge, and I don't recall them reporting any at the coast, though they may not have been running much with zero layer ice after the 2011 SW coastal fixes in the atmospheric code mentioned in my email. There was a switch to using multilayer ice which we adopted in ACCESS-CM2.

However, we did pick up the ice_history.F90 fixes to smooth the data from the Metoffice that originated at NCAR they smoothed off grid points that appeared and disappeared over a timestep, so NCAR and Met office must still have seen somewhere, interestingly both dropped these from their final CMIP6 processing, but we still had them.

I guess we can test it by doing a run with higher order interpolation in Oasis set up. and work out the additional cost. I know Dave Bi didn't want to use the higher order scheme back in ACCESS1-0/3 days didn't think it was necessary., but it may have been a cost saving thats come back to bite us. We are using higher order in CM3 with NUOPC.

aidanheerdegen commented 3 weeks ago

In the interests of backwards compatibility/bit-repro I don't think the underlying model should be fixed, but this is definitely something we should have on the list of fixes for a subsequent release. @anton-seaice could you have a stab at creating an issue summarising this so we can fix this at a later date? I'm guessing as it's mostly a coupler problem the issue should be in this repo.

Given the above, what to do about the log files?

They compress well. I made a directory (icelog) containing all the log files from a single year, and ran tar -zcf on it. The result is more than an order of magnitude reduction in space.

$ du -shc /tmp/icelog*
896M    /tmp/icelog
46M     /tmp/icelog.tar.gz
941M    total

I think this is a reasonable first step. We should add debugging code fences as a permanent fix, but that can wait for a later release.

ofa001 commented 3 weeks ago

HI @aidanheerdegen @anton-seaice, yes it sounds reasonable to fix it in a later release and in the future ESM1.6. If you can reduce the excessive output files down in size at the end of each run its the best option for now.

access-hive-bot commented 1 week ago

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/access-esm1-5-release-information/2352/1

access-hive-bot commented 1 week ago

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/access-esm1-5-release-information/2352/4