CICE-Consortium / CICE

Development repository for the CICE sea-ice model
Other
57 stars 131 forks source link

Namelist conflict aborts #130

Closed apcraig closed 5 years ago

apcraig commented 6 years ago

It seems there are cases where namelists can be setup inconsistently and CICE takes some initiative to reset the values to something more appropriate. I would recommend we move away from this approach. If namelists are in conflict, an error should be printed and the model should abort. Users should have to setup the namelist correctly themselves. I believe letting the model make the changes introduces too much risk that users don't understand what they are running.

eclare108213 commented 6 years ago

Yes, CICE 'fixes' inconsistencies. The idea was to make the code as 'friendly' as possible. It's super annoying to a user when the code crashes, especially when it takes down a whole coupled system that might have been waiting in the queue for days. There are other ways to provide useful information. Of course we can change the code behavior so that instead of changing parameter/flag values, it only recommends the change and then aborts. But we could also add the equivalent information into the documentation, somewhere near the namelist table, so that people setting up their configuration for the first time might see it.

apcraig commented 6 years ago

I understand that logic. I know it's annoying when a user manually changes namelist and then the model fails. But at least they are learning and it's on them to setup the namelist. And a user can count on whatever they set to actually be used, rather than silently fixing itself. Most users will not pick up on the WARNING messages in the log file until later, if at all and I personally think that's dicey. Even for our test suites, it's hard to know whether the namelist we're setting are actually being tested without going thru each log file everytime we setup a new configuration. And if new features are added that later change how namelist combinations interact, that's even more dangerous.

I created this issues to start a discussion. I understand the history of the implementation, but am more comfortable with the model aborting than overwriting user specified namelist settings. Ideally, we could introduce a "preprocessor" that could find namelist conflicts, but I'm not suggesting we actually do that because that would be redundant with the run time checks and mean we'd have to keep them in sync. Although it would be pretty cool to have some code that could be used in preprocessor mode (like running a script) to check for namelist conflicts and then for the model itself to use the same code at runtime to check for namelist conflicts. But again, that's probably too low priority and high effort to end up at the top or a priority list anytime soon.

eclare108213 commented 6 years ago

I am amenable to these ideas, though I agree that having a preprocessor to check for conflicts is low priority. One problem is that defining whether there is a “conflict” is not clear-cut. Some parameterizations are based on specific assumptions or dependencies that are straight-forward; these generate conflict warnings/resets in the current code. Other cases might be okay from an “it runs” standpoint but not such a good idea scientifically — do we warn/abort those too? We have so many options, I’m in complete agreement that the onus is on the user to check his/her configuration to make sure it’s doing what they want it to do, which in my mind includes examining/setting the namelist AND looking at the output.

A second consideration here is how the individual -s options are used. I don’t really see them as production tools, at least not on the command line. My experience with the scripts so far — and this might be partly because of the way I’ve always worked with the code a la comp_ice — is that I set up a run with maybe a few options that I know I’ll want, run, look at what came out of it, play around with the settings either in the case directory or in the run directory, and then finally at the end create a -s option of my own for the run I want to do, and start the whole process again from scratch. I consider the collections of options to be just for testing purposes. Is that too narrow a view, and we need to consider other user needs or workflows? I expect users will create their own file(s) with the namelist/env combination that they want to use, and then call that with -s. A few existing -s options (like gx1prod) make suggestions for physics choices, but most are just for code coverage. From this point of view, collecting diverse namelist settings into a single set_nml file, explicitly avoiding conflicts, makes sense. If a user wants to try other combinations, great, the documentation is available for guidance on that. We might not use every individual -s option in our tests, but they might be handy to have around. Or they might be overwhelming for the user (and us), if they aren’t used in practice.

My approach here would be toward the middle ground: 1) abort combinations that are definitely bad. We might still want to warn for some others, but not change values at run time — we can think about those case-by-case (the code does some of that now, too). 2) Maintain a ‘default’ configuration using values that we recommend for each grid, gx3, gx1, box. 3) Define our tests using combinations that do not create aborts or warnings, maximizing code coverage and using sets of namelist values that ought to be defined together. Throw all the leftover stuff into ’scraps’ tests and 4) make it clear in the documentation what the status of these various combinations is — recommended, alternatives with required combination settings, and scraps. In all cases: use at your own risk, pay attention to the diagnostic output, etc.

Is that reasonable? It feels like the namelist option infrastructure might be getting a little too complex. Is there a simpler way to do it, in which we maximize code coverage and provide recommended configurations, without needing to provide full flexibility for the user?

apcraig commented 6 years ago

@eclare108213, we are on the same page. I agree with the strategy you outline. I would even go a little farther and say users should not be encouraged to use the -s option, just create a case, and edit the namelist from there. That's why everything is resolved after cice.setup is run. I agree the main purpose of the -s options is for automated testing. Users might or might not leverage them, but I think often they will not or should not. They should start with the standard, out-of-the-box, default, know-it-works namelist and then manually change them from there. I would be surprised if many users would generate an "option set" for them, although that's a good idea from a reproducibility perspective.

Now might be a good time to think about whether the options are doing what we want. Currently, they are setup to change one-ish namelist setting and that allows us to test namelist settings fairly ad-hoc, which is nice in some ways. But I think we could go different directions. We could create sets of options, like alt01, alt02, alt03 that consisted of a full set of specific namelist settings and get rid of all the "individual options". We could even consider doing something like making the alt01, alt02, etc options be full namelists, more or less, either overriding the default file or doing large scale substitution. In that sense, the alts don't depend on the default namelist values. They always set all values or even consist of a full namelist file that is just copied in. I think there are pluses and minuses to the various approaches. The current system provides quite a bit of flexibility, but some complexity and assumes certain defaults. The approach where alt files are fully specified requires some maintenance as namelists evolve, is a little cleaner in some ways, but also provides less flexibility in other ways.

Separately, maybe we should be thinking about whether we should "advertise" the -s flag to users with --case, --test, or --suite. Maybe it should be a hidden option with --case and an explicit option with --test? I think we really do need to consider first, how we want the community to use the scripts, and second, how the scripts can help us automate testing, and then work implementation and documentation around that. I think we can support two slightly different modes of operating just by crafting the documentation correctly even though the scripts might not fundamentally differ in implementation for those modes.

eclare108213 commented 6 years ago

I'd like some other members of the testing team to weigh in on the implementation. @rallard77 @mattdturner @dabail10 ... We need to balance simplicity & ease of use against complexity & flexibility, for two sets of scripts users (Consortium testing/developers and coupled modeling centers), for two main purposes (testing and configuration guidance). I expect that our users will be willing to dig into the code and the scripts, and not just use it all as a black box, because they will need to implement the code in another, even more complex modeling system. My opinions: Simplicity and clarity is better for them and for us. I like that alt01 etc show the changes from the default namelists (not the whole namelist file). I'd rather not have reams of individual options unless they serve a useful purpose for the Consortium's testing needs. I like the idea of crafting the documentation to emphasize the recommended workflow, while providing a hint to the motivated user that the scripts can do more. Other opinions?

apcraig commented 6 years ago

Just another quick comment. For coupled models, the cice scripts will probably not be used at all. In fact, there has been no requirement like that built into the current scripts. I assume that other groups that bring in CICE to their coupled system will have their own build and namelist generation. We could try to have the cice scripts work in coupled systems, but we'd need a bunch of input and requirements. With some work, we could probably create a system that would create a "lib" version of CICE without the driver and maybe for that to work with a separate Macros file provided by the coupled system. But it's not clear it's worth it for us to do that. In most coupled systems, the coupled system provides a build system, a Makefile, a Macros file, and a way to set the input files. And they just point to the CICE source code.

rallard77 commented 6 years ago

I agree in keeping the options simple, clear, but robust. I like Tony's suggestion to not advertise the "-s" option. I agree that bad namelist combinations should abort, and the burden would fall on the user to read the logfile and see the recommendations for what could/should be changed. I feel we are in a good position to automate the test scripts for both Icepack and CICE. I recommend we automate the testing process with an agreed upon set of compile and test options (Tony has made numerous good suggestions) with the intent of covering as many aspects of the code as possible. Let's get this running routinely and plan for future enhancements/improvements.

eclare108213 commented 6 years ago

Suggestions for dealing with the code's consistency checks. I'm generally not in favor of aborting unless it's really necessary. There are cases where I think we can state the assumptions in the documentation and maybe warn at runtime, e.g. the first line below.

The cases below are only the warnings and resets, not the cases that already abort.

general/ice_init.F90

  if (trim(runtype) == 'continue') restart = .true.

This one seems obvious to me. The code doesn't even warn for this case. The same is done for other restart flags in the namelist, including tracers. The following code block deals with the same issue, again assuming runtype takes precedence and thus setting restart = .false.

  if (trim(runtype) /= 'continue' .and. (restart)) then
     if (ice_ic == 'none' .or. ice_ic == 'default') then
        if (my_task == master_task) then
        write(nu_diag,*) &
        'WARNING: runtype, restart, ice_ic are inconsistent:'
        write(nu_diag,*) trim(runtype), restart, trim(ice_ic)
        write(nu_diag,*) &
        'WARNING: Need ice_ic = <filename>.'
        write(nu_diag,*) &
        'WARNING: Initializing using ice_ic conditions'
        endif
        restart = .false.
     endif
  endif

The next one is used when ice_ic is set to the restart filename. There are two choices it could be, either to start with no ice or with a default, hard-wired distribution, and the code choose no ice. We could abort with the two choices, or with a pointer to the relevant documentation for all of the choices (there are quite a few permutations).

  if (trim(runtype) == 'initial' .and. .not.(restart)) then
     if (ice_ic /= 'none' .and. ice_ic /= 'default') then
        if (my_task == master_task) then
        write(nu_diag,*) &
        'WARNING: runtype, restart, ice_ic are inconsistent:'
        write(nu_diag,*) trim(runtype), restart, trim(ice_ic)
        write(nu_diag,*) &
        'WARNING: Initializing with NO ICE: ' 
        write(nu_diag,*) ' '
        endif
        ice_ic = 'none'
     endif
  endif

The next one checks for an inconsistency between a cpp def and the namelist values, assumes the cpp rules, and does not warn. This could abort with a warning to change one or the other.

  #ifndef ncdf
  ! netcdf is unavailable
  grid_format     = 'bin'
  atm_data_format = 'bin'
  ocn_data_format = 'bin' 
  #endif

We only have two advection choices now -- we used to also have mpdata, and this made sure that wasn't input as an option. This could be removed completely, but let's make sure that the code actually fails if we aren't using one of these two. OR add a third option, 'none' which effectively turns off the advection. We need that for the box tests!

  chartmp = advection(1:6)
  if (chartmp /= 'upwind' .and. chartmp /= 'remap ') advection = 'remap'

Let's abort and make the users look up the options for kitd and kcatbound:

  if (ncat /= 1 .and. kcatbound == -1) then
     if (my_task == master_task) then
        write (nu_diag,*) &
           'WARNING: ITD required for ncat > 1'
        write (nu_diag,*) &
           'WARNING: Setting kitd and kcatbound to default values'
     endif
     kitd = 1
     kcatbound = 0
  endif

Abort with warning (though it's painful):

  if (kdyn == 2 .and. revised_evp) then
     if (my_task == master_task) then
        write (nu_diag,*) &
           'WARNING: revised_evp = T with EAP dynamics'
        write (nu_diag,*) &
           'WARNING: Setting revised_evp = F'
     endif
     revised_evp = .false.
  endif

This is a basic assumption for tr_pond_lvl -- it requires the level-ice tracer be on. I think it's okay to reset this one (but we can discuss).

  if (tr_pond_lvl .and. .not. tr_lvl) then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: tr_pond_lvl=T but tr_lvl=F'
        write (nu_diag,*) 'WARNING: Setting tr_lvl=T'
     endif
     tr_lvl = .true.
  endif

I don't think that not resetting hs0 here would cause a problem, but it would cause the code to do some extra calculations, which are confusing if you're stepping through the code with a debugger (or print statements). Two options: try not setting it and see whether the answers change, or write a new block of code that takes the namelist value and assigns it (or not) to a new variable (recoding required) depending on which pond scheme is used.

  if (tr_pond_lvl .and. abs(hs0) > puny) then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: tr_pond_lvl=T and hs0/=0'
        write (nu_diag,*) 'WARNING: Setting hs0=0'
     endif
     hs0 = c0
  endif

The next one is harmless. We could take this out completely.

  if (tr_pond_cesm .and. trim(frzpnd) /= 'cesm') then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: tr_pond_cesm=T'
        write (nu_diag,*) 'WARNING: frzpnd, dpscale not used'
     endif
     frzpnd = 'cesm'
  endif

Abort with warning:

  if (trim(shortwave) /= 'dEdd' .and. tr_pond .and. calc_tsfc) then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: Must use dEdd shortwave'
        write (nu_diag,*) 'WARNING: with tr_pond and calc_tsfc=T.'
        write (nu_diag,*) 'WARNING: Setting shortwave = dEdd'
     endif
     shortwave = 'dEdd'
  endif

  if (tr_aero .and. trim(shortwave) /= 'dEdd') then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: aerosols activated but dEdd'
        write (nu_diag,*) 'WARNING: shortwave is not.'
        write (nu_diag,*) 'WARNING: Setting shortwave = dEdd'
     endif
     shortwave = 'dEdd'
  endif

This one needs an abort AND a warning!

  if (trim(atm_data_type) == 'monthly' .and. calc_strair) &
     calc_strair = .false.

Abort with warning:

  if (ktherm == 2 .and. .not. calc_Tsfc) then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: ktherm = 2 and calc_Tsfc = F'
        write (nu_diag,*) 'WARNING: Setting calc_Tsfc = T'
     endif
     calc_Tsfc = .true.
  endif

As long as the code runs, I think these warnings are fine. They do not reset anything, just warn that the physical choices are inconsistent.

  if (ktherm == 1 .and. trim(tfrz_option) /= 'linear_salt') then
     if (my_task == master_task) then
     write (nu_diag,*) &
     'WARNING: ktherm = 1 and tfrz_option = ',trim(tfrz_option)
     write (nu_diag,*) &
     'WARNING: For consistency, set tfrz_option = linear_salt'
     endif
  endif
  if (ktherm == 2 .and. trim(tfrz_option) /= 'mushy') then
     if (my_task == master_task) then
     write (nu_diag,*) &
     'WARNING: ktherm = 2 and tfrz_option = ',trim(tfrz_option)
     write (nu_diag,*) &
     'WARNING: For consistency, set tfrz_option = mushy'
     endif
  endif

I'd put this warning in the forcing module, and abort instead of changing the units.

  if (trim(atm_data_type) == 'hadgem' .and. & 
         trim(precip_units) /= 'mks') then
     if (my_task == master_task) &
     write (nu_diag,*) &
     'WARNING: HadGEM atmospheric data chosen with wrong precip_units'
     write (nu_diag,*) &
     'WARNING: Changing precip_units to mks (i.e. kg/m2 s).'
     precip_units='mks'
  endif

formdrag depends on a number of particular settings in other parts of the code. Let's document these and abort with a warning message that includes all of them together.

  if (formdrag) then
  if (trim(atmbndy) == 'constant') then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: atmbndy = constant not allowed with formdrag'
        write (nu_diag,*) 'WARNING: Setting atmbndy = default'
     endif
     atmbndy = 'default'
  endif

  if (.not. calc_strair) then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: formdrag=T but calc_strair=F'
        write (nu_diag,*) 'WARNING: Setting calc_strair=T'
     endif
     calc_strair = .true.
  endif

  if (.not. tr_lvl) then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: formdrag=T but tr_lvl=F'
        write (nu_diag,*) 'WARNING: Setting tr_lvl=T'
     endif
     tr_lvl = .true.
  endif
  endif

Abort with warning:

  if (trim(fbot_xfer_type) == 'Cdn_ocn' .and. .not. formdrag)  then
     if (my_task == master_task) then
        write (nu_diag,*) 'WARNING: formdrag=F but fbot_xfer_type=Cdn_ocn'
        write (nu_diag,*) 'WARNING: Setting fbot_xfer_type = constant'
     endif
     fbot_xfer_type = 'constant'
  endif

We need to put this in the documentation somewhere, if it's not already there. I'd leave this in the code.

  #ifdef coupled
     if( oceanmixed_ice ) then
        write (nu_diag,*) 'WARNING WARNING WARNING WARNING '
        write (nu_diag,*) '*Coupled and oceanmixed flags are  *'
        write (nu_diag,*) '*BOTH ON.  Ocean data received from*'
        write (nu_diag,*) '*coupler will be altered by mixed  *'
        write (nu_diag,*) '*layer routine!                    *'
        write (nu_diag,*) ' '
     endif
  #endif

Remove:

        write (nu_diag,*) 'WARNING - Zero-layer thermodynamics'

general/ice_forcing.F90

We need to put this in the documentation somewhere, if it's not already there. I'd leave this in the code.

     write (nu_diag,*) 'WARNING: evp_prep calculates surface tilt'
     write (nu_diag,*) 'WARNING: stress from geostrophic currents,'
     write (nu_diag,*) 'WARNING: not data from ocean forcing file.'
     write (nu_diag,*) 'WARNING: Alter ice_dyn_evp.F if desired.'

infrastructure/ice_restoring.F90

This setting is necessary for open boundary conditions. Reset (as is) or abort.

 if (ew_boundary_type == 'open' .and. &
   ns_boundary_type == 'open' .and. .not.(restart_ext)) then
  if (my_task == master_task) write (nu_diag,*) &
        'WARNING: Setting restart_ext = T for open boundaries'
  restart_ext = .true.
 endif

infrastructure/ice_domain.F90

This warning seems fine. The code runs, but uses more memory than necessary.

 else if (nblocks_max < max_blocks) then
 write(outstring,*) &
     'ice: no. blocks too large: decrease max to', nblocks_max
 if (my_task == master_task) then
    write(nu_diag,*) ' ********WARNING***********'
    write(nu_diag,*) trim(outstring)
    write(nu_diag,*) ' **************************'
    write(nu_diag,*) ' '
 endif
 endif

shared/ice_init_column.F90

There are a bunch more for bgc, which we can come back to once we decide how to handle the others.

  !-----------------------------------------------------------------
  ! zsalinity and brine
  !-----------------------------------------------------------------
  if (solve_zsal .and. TRZS == 0) then
     write(nu_diag,*) 'WARNING: solve_zsal=T but 0 zsalinity tracers'
     write(nu_diag,*) 'WARNING: setting solve_zsal = F'
     solve_zsal = .false.      
  elseif (solve_zsal .and. nblyr < 1)  then
     write(nu_diag,*) 'WARNING: solve_zsal=T but 0 zsalinity tracers'
     write(nu_diag,*) 'WARNING: setting solve_zsal = F'
     solve_zsal = .false.     
  endif 

  if (solve_zsal .and. ((.not. tr_brine) .or. (ktherm /= 1))) then
     write(nu_diag,*) 'WARNING: solve_zsal needs tr_brine=T and ktherm=1'
     write(nu_diag,*) 'WARNING: setting tr_brine=T and ktherm=1'
     tr_brine = .true.
     ktherm = 1
  endif

  if (tr_brine .and. TRBRI == 0 ) then
     write(nu_diag,*) &
        'WARNING: tr_brine=T but no brine height compiled'
     write(nu_diag,*) &
        'WARNING: setting solve_zsal and tr_brine = F'
     solve_zsal = .false.
     tr_brine  = .false.
  elseif (tr_brine .and. nblyr < 1 ) then
     write(nu_diag,*) &
        'WARNING: tr_brine=T but no biology layers compiled'
     write(nu_diag,*) &
        'WARNING: setting solve_zsal and tr_brine = F'
     solve_zsal = .false.
     tr_brine  = .false.
  endif 

  !-----------------------------------------------------------------
  ! biogeochemistry
  !-----------------------------------------------------------------

  if (.not. tr_brine) then
     if (solve_zbgc) then
        write(nu_diag,*) 'WARNING: tr_brine = F and solve_zbgc = T'
        write(nu_diag,*) 'WARNING: setting solve_zbgc = F'
        solve_zbgc = .false.
     endif
     if (tr_zaero) then
        write(nu_diag,*) 'WARNING: tr_brine = F and tr_zaero = T'
        write(nu_diag,*) 'WARNING: setting tr_zaero = F'
        tr_zaero = .false.
     endif
  endif

  if (skl_bgc .AND. tr_zaero) then
     write(nu_diag,*) 'WARNING: skl bgc does not use vertical tracers'
     write(nu_diag,*) 'WARNING: setting tr_zaero = F'
     tr_zaero = .false.
  endif

  if (dEdd_algae .AND. trim(shortwave) /= 'dEdd') then 
     write(nu_diag,*) 'WARNING: dEdd_algae = T but shortwave /= dEdd'
     write(nu_diag,*) 'WARNING: setting dEdd_algae = F'
     dEdd_algae = .false.
  endif

  if (dEdd_algae .AND. (.NOT. tr_bgc_N) .AND. (.NOT. tr_zaero)) then 
     write(nu_diag,*) 'WARNING: need tr_bgc_N or tr_zaero for dEdd_algae'
     write(nu_diag,*) 'WARNING: setting dEdd_algae = F'
     dEdd_algae = .false.
  endif

  if (modal_aero .AND. (.NOT. tr_zaero) .AND. (.NOT. tr_aero)) then
     modal_aero = .false.
  endif

  if (modal_aero .AND. trim(shortwave) /= 'dEdd') then 
     write(nu_diag,*) 'WARNING: modal_aero = T but shortwave /= dEdd'
     write(nu_diag,*) 'WARNING: setting modal_aero = F'
     modal_aero = .false.
  endif

  if ((TRBGCS == 0 .and. skl_bgc) .or. (TRALG == 0 .and. skl_bgc)) then
     write(nu_diag,*) &
        'WARNING: skl_bgc=T but 0 bgc or algal tracers compiled'
     write(nu_diag,*) &
        'WARNING: setting skl_bgc = F'
     skl_bgc = .false.
  endif

  if ((TRBGCZ == 0 .and. solve_zbgc) .or. (TRALG == 0 .and. solve_zbgc)) then
     write(nu_diag,*) &
        'WARNING: solve_zbgc=T but 0 zbgc or algal tracers compiled'
     write(nu_diag,*) &
        'WARNING: setting solve_zbgc = F'
     solve_zbgc = .false.
  endif

  if (solve_zbgc .and. .not. z_tracers) z_tracers = .true.
  if (skl_bgc .or. solve_zbgc) then
     tr_bgc_N         = .true.   ! minimum NP biogeochemistry
     tr_bgc_Nit       = .true.
  else
     tr_bgc_N         = .false.
     tr_bgc_C         = .false.
     tr_bgc_chl       = .false.
     tr_bgc_Nit       = .false.
     tr_bgc_Am        = .false.
     tr_bgc_Sil       = .false.
     tr_bgc_hum       = .false.
     tr_bgc_DMS       = .false.
     tr_bgc_PON       = .false.
     tr_bgc_DON       = .false.
     tr_bgc_Fe        = .false.
  endif

  !-----------------------------------------------------------------
  ! z layer aerosols
  !-----------------------------------------------------------------
  if (tr_zaero .and. .not. z_tracers) z_tracers = .true.

Can I go home now?

apcraig commented 6 years ago

I am going to work on this as part of #129. I will try to update the "warning" implementation as outlined above, update the options as needed, and update the test cases.

apcraig commented 6 years ago

This has largely been addressed in #129 but still needs to be completed for the bgc section of code.

apcraig commented 5 years ago

The bgc part is being addressed in #235