CH-Earth / summa

Structure for Unifying Multiple Modeling Alternatives:
http://www.ral.ucar.edu/projects/summa
GNU General Public License v3.0
79 stars 103 forks source link

Breaking SUMMA with too large routingGammaScale during calibration #457

Open lou-a opened 3 years ago

lou-a commented 3 years ago

**

SUMMA_break_test_case.zip

**

To calibrate SUMMA, the Ostrich software toolkit running on Cheyenne estimates new sets of parameters using multipliers. It calculates the multiplier ranges (param_min to param_max) using the following formulas:

param_min.append(float(var_min)/float(var_ini_min)) param_max.append(float(var_max)/float(var_ini_max))

Where var_min and var_max are the parameter min and max values respectively found in the look up tables localParamInfo.txt or the basinParamInfo.txt. And var_ini_min and var_ini_max are the min and max values found in a netcdf of initial parameter values (often called trialParams.nc). In our case, the trial parameters netcdf is filled with the default parameter values found in the look up tables for all HRUs/GRUs.

Here are the parameters which we calibrate using Ostrich:

routingGammaScale is the one we will be focusing on here as it is caused SUMMA to break in our test case. The multipliers for this parameter were calculated to range from 5e-05 to 250.0, as the look up table values for this parameter are:

parameter name | default parameter value | lower parameter limit | upper parameter limit routingGammaScale | 20000.0000 | 1.0000 | 5000000.0000

However, when Ostrich sets routingGammaScale=2740192.0, this breaks SUMMA with the following error message:

fraction of basin runoff histogram being accounted for by time delay vector is 0.97787860214872724
this is less than allowed by tolerFrac = 1.0000000000000000E-002

FATAL ERROR: summa_paramSetup/fracFuture/not enough bins for the time delay histogram -- fix hard-coded parameter in globalData.f90

Ostrich successfully tested values up to routingGammaScale=1483098.0 which did not break SUMMA.

Why issue occurs in SUMMA

This issue can be traced to variable nTimeDelay in globalData.f90 which specifies a maximum time delay histogram length for within-basin routing of 2000 hours (time steps?). This does not capture a sufficiently large fraction of total runoff for very slow within-basin routing parameters.

Possible solution

Compute nTimeDelay (used as nTDH in the code) from the chosen gammaRoutingShape and gammaRoutingScale values to ensure that the histogram is long enough to capture 99% of runoff (current tolerance in var_derive.f90).

Reproducibility

All files are attached necessary to reproduce this error in SUMMA. Where trialParams_break.nc are the default parameter values from the look up tables, except for routingGammaScale set to 2740192.0, which breaks SUMMA. And trialParams_nobreak.nc are the default parameter values, except for routingGammaScale set to 1483098.0, the highest tested value during the calibration for this parameter which we know does not break SUMMA.

andywood commented 3 years ago

This can happen, definitely. There are a few solutions.

One is to set localParam limits smaller .. based on some calculations of what would be reasonable outcomes for most rivers in the US, I've been using:

! **** routingGammaShape | 2.5000 | 1.0000 | 5.0000 routingGammaScale | 18000.0 | 360 | 72000.0 ! ****

I think this gives you up to about 3 months in the vector and for most parameter combinations (ie for hourly simulation) for runoff to drain from watershed, which is a pretty relaxed limit. But I forget the details, it was a while ago. You can play around with gamma functions in R or python to gauge where you want your limits to be. Having tighter limits improves the identifiability of the parameters anyway in the optimization context. The defaults you're using struck me as being too high for the scale of our applications.

Another consideration to take into account is that nTDH is now a fixed dimension in warm-state files -- I added the routing time delay vector so that we could get exact restarts for basin runoff). There could be tricky code or usage implications to letting it vary dynamically.

Another solution is to set the truncation of the vector at higher tolerances, assuming you're happy with the vector limit (eg 2000).

Another solution is not to calibrate the gammas -- I just added them to the param calibration list because I'm not calibrating a time-delay in mizuroute, but you could set them using a GIUH type approach.

Cheers, -Andy

On Wed, May 12, 2021 at 10:20 AM lou-a @.***> wrote:

**

SUMMA_break_test_case.zip https://github.com/NCAR/summa/files/6467549/SUMMA_break_test_case.zip

**

To calibrate SUMMA, the Ostrich software toolkit running on Cheyenne estimates new sets of parameters using multipliers. It calculates the multiplier ranges (param_min to param_max) using the following formulas:

param_min.append(float(var_min)/float(var_ini_min)) param_max.append(float(var_max)/float(var_ini_max))

Where var_min and var_max are the parameter min and max values respectively found in the look up tables localParamInfo.txt or the basinParamInfo.txt. And var_ini_min and var_ini_max are the min and max values found in a netcdf of initial parameter values (often called trialParams.nc). In our case, the trial parameters netcdf is filled with the default parameter values found in the look up tables for all HRUs/GRUs.

Here are the parameters which we calibrate using Ostrich:

  • k_macropore
  • k_soil
  • theta_sat
  • aquiferBaseflowExp
  • aquiferBaseflowRate
  • qSurfScale
  • summerLAI
  • frozenPrecipMultip
  • heightCanopyBottom
  • thickness (the distance between heightCanopyBottom and heightCanopyTop)
  • routingGammaScale
  • routingGammaShape

routingGammaScale is the one we will be focusing on here as it is caused SUMMA to break in our test case. The multipliers for this parameter were calculated to range from 5e-05 to 250.0, as the look up table values for this parameter are:

parameter name | default parameter value | lower parameter limit | upper parameter limit routingGammaScale | 20000.0000 | 1.0000 | 5000000.0000

However, when Ostrich sets routingGammaScale=2740192.0, this breaks SUMMA with the following error message:

fraction of basin runoff histogram being accounted for by time delay vector is 0.97787860214872724 this is less than allowed by tolerFrac = 1.0000000000000000E-002

FATAL ERROR: summa_paramSetup/fracFuture/not enough bins for the time delay histogram -- fix hard-coded parameter in globalData.f90

Ostrich successfully tested values up to routingGammaScale=1483098.0 which did not break SUMMA.

Why issue occurs in SUMMA

This issue can be traced to variable nTimeDelay in globalData.f90 which specifies a maximum time delay histogram length for within-basin routing of 2000 hours (time steps?). This does not capture a sufficiently large fraction of total runoff for very slow within-basin routing parameters.

Possible solution

Compute nTimeDelay (used as nTDH in the code) from the chosen gammaRoutingShape and gammaRoutingScale values to ensure that the histogram is long enough to capture 99% of runoff (current tolerance in var_derive.f90).

Reproducibility

All files are attached necessary to reproduce this error in SUMMA. Where trialParams_break.nc are the default parameter values from the look up tables, except for routingGammaScale set to 2740192.0, which breaks SUMMA. And trialParams_nobreak.nc are the default parameter values, except for routingGammaScale set to 1483098.0, the highest tested value during the calibration for this parameter which we know does not break SUMMA.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCAR/summa/issues/457, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIKARNCQGDIGUTS5FWC4VTTNKTFDANCNFSM44Y5VM3A .

arbennett commented 3 years ago

Is there any reason not to just expand nTimeDelay to something like 5000 or 10000? We could even do this dynamically if it causes slowdowns or other undesirable side effects.

andywood commented 3 years ago

If needed, that would make sense -- it could be controlled by reading the dimension from the state file, though I think some data structure allocations happen before that. It's been a while since I looked at it.

A downside is that it makes the state files bigger, possibly for no real benefit. I suggest playing around with it. I think nTDH sets timesteps of delayed runoff, and 2000, say hours, is 83 days ... it's hard to imagine needing a hillslope runoff that delayed. If you run at a 15 minute timestep you might want to bump it if you had evidence that you had some super attenuated surface runoff response.

On Wed, May 12, 2021 at 1:14 PM Andrew Bennett @.***> wrote:

Is there any reason not to just expand nTimeDelay to something like 5000 or 10000? We could even do this dynamically if it causes slowdowns or other undesirable side effects.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NCAR/summa/issues/457#issuecomment-840032919, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIKARJOEEKSO4SBZ6BU7STTNLHRXANCNFSM44Y5VM3A .