CICE-Consortium / CICE

Development repository for the CICE sea-ice model
Other
55 stars 130 forks source link

Small bug in hourly history output #289

Closed phil-blain closed 5 years ago

phil-blain commented 5 years ago

I noticed that when using the namelist parameters

   histfreq       = 'h','x','x','x','x'
   histfreq_n     =  1 , 1 , 1 , 1 , 1
   hist_avg       = .false.
   f_uvel         = 'h' 
   f_vvel         = 'h' 

with all other namelist flags to their default values, the output of the first hour (should be iceh_inst.1997-01-01-03600.nc ) is not written. This also happens with hist_avg = .true. . When using outputs at each timestep (histfreq = '1') it works...

dabail10 commented 5 years ago

What is your timestep in the model? I wonder if there is a weird interaction here. This could certainly still be a bug.

phil-blain commented 5 years ago

I'm just testing CICE standalone for now; I use the default timestep (3600 seconds).

dabail10 commented 5 years ago

I believe the problem is how we define "new_hour" in ice_calendar.F90 and the interaction with the timestep. The alarm is triggered when we go to the next hour, so hour 2. So, the timestamp for the first hour is actually 7200 I believe. I think this is just part of a larger issue about the calendar, #162. For now, the workaround is to write every step for instantaneous output, or if you want hourly averages, then change the timestep to 1800s.

phil-blain commented 5 years ago

Ok, I will do that, thanks for taking a look!

phil-blain commented 5 years ago

Just to document, I just noticed that the same bug happens for restarts (bug comes from the same place in ice_calendar.F90)

apcraig commented 5 years ago

I am looking at this now. The problem actually is that all the logic to turn on the history and restart flag is within an if test,

if (istep > 1) then

at about line 280 of ice_calendar.F90. If I change that to (istep > 0), it writes the first history file fine. Now, I am a little reluctant to fix this without some discussion. This is the kind of simple fix that might work perfectly fine in the stand-alone CICE model, but that might break coupled models. Does anyone know why that check is there in the first place?

eclare108213 commented 5 years ago

For amusement, I went looking for the origin of the line

if (istep > 1) then

using the 'blame' function in the repository, and found that it predates the svn repo. Fortunately I recently rescued the very early versions of CICE from a disappearing LANL archive. The line first shows up in CICE version 3.0, and look at the file header:

c $Id: ice_calendar.F,v 1.2 2001/02/23 21:06:52 schramm Exp $ c====================================================================== !---! calendar routines for managing time !---! !---! authors Elizabeth C. Hunke, LANL !---! Tony Craig, NCAR c======================================================================

:) We've been doing this a while!

I don't know about you Tony, but I don't remember the reason for everything that happened between 1999 (the previous release, v2.0.2) and 2001... It's possible that I added that conditional because I didn't want history output etc on the very first timestep, but that doesn't really make sense, so my guess is that CSM (PCM?) wanted it.

Regardless, we need to have some way of making changes like this and testing them in the coupled model configurations. That's supposed to be the domain of the modeling centers - the question is one of process, in my opinion. We could do a quick survey: maybe those with access to a coupled model could comment out that conditional in whatever version they are running and do a quick smoke test to see if it breaks. If not, we could check it into master with a note flagging it as a potential problem, and then over time see if it causes problems as more modeling centers try it. If it does, we can change it back. Sound reasonable?

eclare108213 commented 5 years ago

Actually, I might remember why it's there... I think that the behavior was inconsistent, depending on what frequency the output was being written, or possibly depending on whether the run was an initial run or being restarted from a previous run. It's possible that that conditional was added to make the sets of output files consistent across various output configurations. Maybe. It could also have been added to prevent an output file from a previous run from being overwritten by the new run, although that shouldn't happen if the timesteps are set up correctly.

apcraig commented 5 years ago

@eclare108213 thanks, that's very helpful. I think there are a couple other things to think about. First, we now have the ability to write history every timestep which is new. It seems to me this is sort of a workaround for those that really need the first timestep output. Second, given this has been in the code for 20 years, it's not clear there is much demand for a fix. Third, if we really do want to think about a calendar manager upgrade as noted above by @dabail10 and in #162, I think we could address this issue then. Finally, one other option might be to keep this line within an "ifdef CESMCOUPLED" but remove it otherwise. That might limit damage and provide a place holder for other coupled systems to work around a problem if it exists.

I can try to run RASM with the fix to see what happens. But a small problem is that I may not actually cover the condition where there is a problem as I'm not sure any of us knows what that is. It's also possible that changes in the calendaring over the last 2 decades has fixed the original problem or changed the behavior.

I think if we decide to move forward on some efforts on #162, then that effort should result in the ability to clean up this piece of code. And at that time, we could better understand the limits of the calendar and test various conditions, maybe even building a calendar unit tester. Calendars is the one area that I believe a unit tester is actually very beneficial and worth the effort, even if it's just a separate test driver that we can build. It's worth doing a comprehensive test of the calendar given the different possible calendars needed (leap, noleap, etc), how the calendar interacts with coupling and history, the complexity of the calendar, and the realistic chance that over long runs, calendars can introduce roundoff errors (or integer overflow) if not implemented carefully. Also, there is no practical way to fully test the calendar with a full model as that would be unnecessarily expensive and time consuming.

I think @eclare108213's proposal to test coupled and make the change but note it as a potential problem is reasonable. I think if we decide to continue to defer #162, then that approach especially makes sense. But I guess what I'm proposing is that we think about raising the priority of #162, maybe setting a goal for updating the calendar in the next 12 months, and that we leave #289 as it is for now (or add an ifdef as another low risk option). I would say #162 is a medium size job in the sense that I think it'll be important to have a good process. I think we'll want to collect requirements/features/use cases to understand what stand-alone and coupled model need/want, we'll probably want to have a design document, and we'll want to build a unit tester for it. We'll then need to test it in a few systems and make sure we understand the impact on coupled models for rolling the new calendar manager into their models (there is usually a slightly tricky interaction between the coupled system calendar and the model internal calendar that is handled in the coupling layer). I only point this out here because I think it's important to think about the priority, cost, and risks in updating the CICE calendar ala #162 and because I think how we feel about #162 may feed into how we proceed here.

apcraig commented 5 years ago

I ran a quick exact restart test in RASM with cice6.0.0 with the mod and it passed and was bit-for-bit with the baseline. Now, in that case we're now writing history or restarts at step=1 anyway. Just a datapoint, even if maybe not particularly useful.

eclare108213 commented 5 years ago

My recommendation is to defer this issue and address it with #162. I like @apcraig's suggestion to create comprehensive unit testing for the calendar functionality, and to do this within the next year or so. In the meantime, we can note this timestep-1 behavior somewhere in the docs ("known issues").

proteanplanet commented 5 years ago

During the next year, I will be bringing the satellite emulator into Icepack, and calendar functionality is a critical part of this, down to the millisecond. I suggest that Tony and I combine our efforts into that work.

apcraig commented 5 years ago

The documentation was updated with PR #336 to note a "known bug". I am going to close this, but also add a note to #162 to revisit this issue there.