CICE-Consortium / CICE

Development repository for the CICE sea-ice model
Other
53 stars 128 forks source link

First round of housekeeping on ice_grid #921

Closed TillRasmussen closed 6 months ago

TillRasmussen commented 7 months ago

For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium, please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers

PR checklist

TillRasmussen commented 7 months ago

I will check up on why it fails

anton-seaice commented 7 months ago

I will check up on why it fails

I noticed these failures last week with GHActions - it may not be related to your change, but something in the dependencies/environment failing.

eclare108213 commented 7 months ago

Looks like the runs with debugging flags are failing. Might still be an env problem...

apcraig commented 7 months ago

I'll have a look at the code mods and debug github actions in the next day or two.

TillRasmussen commented 7 months ago

I have run the following test suites before and after the implementation: base_suite,decomp_suite,gridsys_suite with and without debug and with both intel and fortran. All runs without debug Pass. Most of the runs pass with debug except for 24 test with gbox and boxchan1*. These fail both in the current upstream and in the pull request. Therefore I dont think that is an issue of this pull request. It seems to stop when it tries to write min/max of lon/lat on line 2579 of ice_grid.F90 in the version in the pull request.

apcraig commented 7 months ago

@TillRasmussen, can you provide the list of the 24 tests that are failing when you force debug on.

The github actions failures are unrelated. github actions is throwing a floating point exception with no traceback or other information for a couple of the debug tests. I have not been able to duplicate elsewhere yet. Looks like our github actions testing recently switched automatically from MacOS 12.6.9 to 12.7.2. Clang changed from 15.0.7 to 16.0.6. gfortran is still using 12.3.0. I'm sure this is related. Frustrating that we constantly have to battle this stuff. I'll see if I can revert back to the prior version.

TillRasmussen commented 7 months ago

errorlog.log

Error message in freya_intel_smoke_gbox80_8x1_boxchan1n_gridc_debug.grid_testfull_debug forrtl: error (72): floating overflow Image PC Routine Line Source cice 00000000023171EE Unknown Unknown Unknown cice 0000000001CBF2A0 Unknown Unknown Unknown cice 0000000000ED8606 ice_grid_mp_nelat 2579 ice_grid.F90 cice 0000000000E706C2 ice_grid_mpinit 668 ice_grid.F90 cice 000000000040A993 cice_initmod_mp_c 130 CICE_InitMod.F90 cice 000000000040A70B cice_initmod_mp_c 54 CICE_InitMod.F90 cice 000000000040A391 MAIN__ 43 CICE.F90 cice 000000000040A31E Unknown Unknown Unknown cice 00000000023E48F9 Unknown Unknown Unknown cice 000000000040A209 Unknown Unknown Unknown _pmiu_daemon(SIGCHLD): [NID 00134] [c0-0c2s1n2] [Wed Dec 20 13:11:46 2023] PE RANK 0 exit signal Aborted [NID 00134] 2023-12-20 13:11:46 Apid 84897883: initiated application termination Application 84897883 exit codes: 134 Application 84897883 resources: utime ~0s, stime ~0s, Rss ~14372, inblocks ~0, outblocks ~0

apcraig commented 7 months ago

@TillRasmussen, not the error message, just the list of the 24 tests that failed. Like freya_intel_smoke_gbox80_8x1_boxchan1n_gridc_debug

I'd like to add some tests to the test suite and fix the error.

TillRasmussen commented 7 months ago

See the link to errorlog.log in the top of the same comment

apcraig commented 7 months ago

@TillRasmussen, perfect, thanks.

apcraig commented 7 months ago

@TillRasmussen, you should be able to pull the latest from the current trunk to fix github actions if you like (not necessarily required as long as we tested the latest code elsewhere). I also fixed the problem you found with the single channel debug run and updated the test suite to include that test.

Separately, I ran a full test suite of these modifications on Derecho with the latest trunk code and it all looks fine, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#28cb2baff01e4338e73be5c07721d2bc99f682e2.

I still want to review the code a bit more. But first, there are several places where code is commented out, should that be cleaned up?

apcraig commented 7 months ago

Just realized my testing did NOT include the newgrid mods. Am retesting now.

TillRasmussen commented 6 months ago

All is bit for bit on my test after Tony fixed the single channel bug. I would like to remove the code that has been commented out but I am not sure if it should be kept for legacy reasons.

TillRasmussen commented 6 months ago
* line 203 in ice_dyn_shared.  Why is this there?  If it's a future placeholder for the 1d implementation, lets remove this commented out line and add the 1d stuff at a later time cleanly.  And fix indentation of lines 204-210.

You are right it should not be there and it was a leftover from a test with 1d. I decided to add it later. Indent will be fixed.

* in ice_dyn_shared, should we move the dxhy, dyhx, cyp, cxp, cym, cxm initialization to the big "block loop" above with the halo update outside.  And do we want/need to initialize those arrays to c0 first (the halo values are uninitialized)?  If not, lines 367-406 need to be indented properly.

We can add the initialization but it was not initialized in the original version. As a general thing I don't think that we should initialize to c0 from 1 to nx_block/ny_block. I would rather do it from ilo-nghost to ihi+nghost. For dxhy and dyhx then a halo update is applied with the result that the halo zone has a value. I would think that the right thing to do is to run the loop from ilo to ihi+nghost and jlo to jhi+nghost and skip the halo update afterwards. This assumes that HTE and HTN exist on the halo. And it may cause the result to be not bit for bit. The four other parameters are calculated where they are used according to the T grid.

* adjust indentation on the allocate at lines 215-220 in ice_dyn_evp to be consistent with allocate statements above.  Also, the ratiod* variables are not initialized on the halo, is that OK?  Again, should those arrays be set globally to c0 before being filled with valid values in ilo:ihi, jlo:jhi? 

Indent is fixed in next push. There was no initialization of these in the original code. I can add this.

* The new "use" statements in ice_dyn_vp at lines 2746 and 3148 are not needed, remove them.

They are needed because they have been removed from the module declaration. If they remain there then I think that we should remove them from the subroutine arguments (e,g, matvec).

* In ice_grid in subroutine latlongrid, the initialization of dxhy, dyhx, cxp, cyp, cxm, cym are removed.  These values will be set in ice_dyn_shared now and end up with different values.  I assume this is OK since I assume they are not important (latlongrid is NOT use with dynamics), but it's an option that is NOT tested in the standalone version of the code, a risk I was concerned about in making these changes.

They are set to a large number in latlongrid which is only called when cpp flag CESMCOUPLED is true. That is why I marked them with Till and I would like @dabail10 to look at it as I am not sure if this is used and when. It should not be used within the standalone version and I assume that it is not used when it is set to a value like this.

  • An important question is whether xav, yav, xxav, and yyav should be scalars or arrays. There are several points. First, the current code suggests there may be cases where these variables vary in space (even though they don't currently). Is that likely to be implemented? Second, making them scalars requires that "limited_gradient" be overloaded with a completely redundant subroutine, just changing two arguments from arrays to scalars. This seems like NOT a great outcome. Third, somewhat associated with the first point, are the extra arrays xyav, xxxav, etc which are now set to c0 and largely commented out ever going to be used/needed, should we keep that code around? I favor the following if it makes sense,

Whether or not all of these variables to be used or not is something that others have to judge. This is the reason why I did not remove all the code where it has been commented out. xxav and yyav do not have any limitations when they are scalar. xav and yav is argument to the limited_gradient subroutine, however this routine is also called with other arguments that are varying in space. I made a choice with an interface. I am happy to change it back and make xav and yav 2d arrays. One thing I would like to do is to remove the calculations where xav and yav is used in all other places than in the limited_gradient routine as it adds terms that are multiplied by c0. This also replies to the comment below.

  * keep xav, yav, xxav, yyav arrays and have just a single version of limited_gradient.  I understand this is a little inefficient for memory, but the cost is small versus the tradeoff of having two identical subroutines  to maintain.  Alternatively, xav, yav, xxav, yyav could be scalars, but copied into temporary arrays before calling limited gradient.  Memory use would be similar (although potentially dynamic), but I like that approach less than just making them arrays from the start
  * remove xyav, xxxav, xxyav, xyyav, yyyav completely unless we anticipate them being useful in the next 5 years.

I agree that the above variables should be removed. But I think that others should make the call. This also relates to your comment below.

  • Finally, I agree that commented out code should be removed.
apcraig commented 6 months ago

Thanks @TillRasmussen. Just a quick followup.

You are right that we need the use statements in ice_dyn_vp, my checking before must have been done improperly.

Traditionally, we initialize from 1:nx_block, 1:ny_block. By definition, this is identical to ilo-nghost:ihi+nghost, and I see no reason to change our current approach or make it more complicated. If an array has a halo update after ilo:ihi, jlo:jhi is initialized, then I think it's OK not to initialize the halo before the halo update. But there are open questions about whether we should initialize arrays when we allocate them or whether every memory location should be initialized in the code even if it's not used. I don't think we have a formal standard at this time.

The latlongrid should be OK as you have it implemented, just pointing out that it is changing.

I would like to hear others thought about the xxav arrays. Could they be spatially varying at some point? Should we get rid of the arrays we're not using right now? @eclare108213? I still strongly favor having a single implementation of limited_gradient regardless.

TillRasmussen commented 6 months ago

New version of ice transport remap. Still bit for bit. Removed limited_gradient interface, removed multiplications with xav and yav (commented out as they are equal zero). Only present as 2d array in call to limited_gradient. Now local variable in construct_fields subroutine.

apcraig commented 6 months ago

We need to clarify a plan for xav, yav, xxav, yyav, etc. I'm not sure about making xxav and yyav parameters and making xav and yav local arrays in construct_fields is the way to go. Lets formulate the long term plan then decide where the fields should live and how they should be initialized. @eclare108213? @whlipscomb?

To be more specific, this block in subroutine init_remap defines (and comments out) a bunch of arrays that could more easily be treated as scalars and/or removed from the code (including in ice_grid where they are defined by also commented out). There are some comments about the possibility that the arrays would be spatially varying and the need for additional terms for non-rectangular grids, but that has never been implemented. The question is, should we leave all this in place, like it is now. Or can we completely remove arrays that are not used (like xxxav, xxyav, etc) and make xav, yav, xxav, yyav scalar parameters if that makes the code cleaner? I'm not advocating one way or another myself, but this PR basically does that and I just want to make sure it's OK to do so.

      !$OMP PARALLEL DO PRIVATE(iblk,i,j) SCHEDULE(runtime)                                                                                                            
      do iblk = 1, nblocks
         do j = 1, ny_block
         do i = 1, nx_block
            xav(i,j,iblk) = c0
            yav(i,j,iblk) = c0
!!!            These formulas would be used on a rectangular grid                                                                                                      
!!!            with dimensions (dxT, dyT):                                                                                                                             
!!!            xxav(i,j,iblk) = dxT(i,j,iblk)**2 / c12                                                                                                                 
!!!            yyav(i,j,iblk) = dyT(i,j,iblk)**2 / c12                                                                                                                 
            xxav(i,j,iblk) = c1/c12
            yyav(i,j,iblk) = c1/c12
!            xyav(i,j,iblk) = c0                                                                                                                                       
!            xxxav(i,j,iblk) = c0                                                                                                                                      
!            xxyav(i,j,iblk) = c0                                                                                                                                      
!            xyyav(i,j,iblk) = c0                                                                                                                                      
!            yyyav(i,j,iblk) = c0                                                                                                                                      
         enddo
         enddo
      enddo
      !$OMP END PARALLEL DO                                                                                                                                            
eclare108213 commented 6 months ago

Those arrays (and comments) have been in the code, commented out, for a LONG time. I think it's fine to clean it all up and get rid of them completely. @whlipscomb is there any conceivable reason why we might need them anytime soon?

apcraig commented 6 months ago

This PR now has all the "extra" grid metrics removed from the dynamics computations. @TillRasmussen, have you checked whether the compilers produce bit-for-bit results? There are enough code changes that it may not. I will fire off a test suite today as well and will post the results. Otherwise, I think this could be ready for merging. I encourage others to review the latest updates, maybe we can merge this week?

TillRasmussen commented 6 months ago

It is bit for bit I have run the following suites base_suite decomp_suite gridsys_suite with gnu and intel and with and without debug

apcraig commented 6 months ago

Independent test results look good, https://github.com/CICE-Consortium/Test-Results/wiki/cice_by_hash_forks#f848367767b492ec41ba176e091818aef83ce6ea