Address large number of iterations done for canopy fluxes

billsacks commented 6 years ago

@swensosc points out that we often go through all 40 possible iterations in canopy fluxes. We should assess whether this can be reduced in order to improve performance.

The same issue could be causing BareGroundFluxes to take a long time.

billsacks commented 6 years ago

@barlage do you want to add your recent findings to this issue?

billsacks commented 6 years ago

@swensosc changed the number of canopy iterations from 40 to 2. This gave close to a factor of 2 difference in the timing of canflux / can_iter. Note that most of the time spent there is in photosynthesis.

We felt it could be worth looking at maps of how many iterations it takes in various places, at various times of day.

billsacks commented 6 years ago

@dlawrenncar points out that the fundamental problem may be something like that the answer is bouncing around rather than actually converging. The first step here is to better characterize what's actually happening.

slevis-lmwg commented 5 years ago

Notes from my Feb 2019 visit to Boulder:

Long term solution In Gordon's view will require redesign of this part of the code and could be major undertaking.

Short term solution Gordon and Mike B believe that a shorter clm timestep may reduce the iterations required. Mike suggested investigating the timestep-to-iteration relationship.

Bill will add iterations setting to the namelist, so wait for that.

Evaluate how much to reduce max number of iterations. Consider averaging when solution bounces around?

slevis-lmwg commented 5 years ago

@billsacks , at our meeting in Boulder you suggested that I wait for you to add an iterations setting to the namelist before I started working on this issue. Is that still the plan or should I go ahead and get started?

slevis-lmwg commented 5 years ago

@billsacks , alternative to my previous question: Would you like me to add the iterations setting to the namelist if you haven't already?

billsacks commented 5 years ago

@slevisconsulting I hope to finally start the work related to this next week, though with other obligations and some vacation time coming up, it might still be a couple of weeks before this is on master. So you shouldn't wait for me, but if we both add this namelist item, it will take some reconciliation. I forget: do you need this namelist item in order to start your work on this? (I forget the details of our earlier discussion.)

slevis-lmwg commented 5 years ago

I've seen a couple of suggestions for what to look for during this investigation:

Maps of iterations for various times of day
My thought is to then plot absolute and relative changes in TV in response to changing itmax_canopy_fluxes
Timestep-to-iteration relationship: I interpret this to mean that I may plot a distribution (or other representation) of iteration values for clm's default timestep, then plot again for 0.5 x timestep, 0.25 x timestep, etc. Do shorter timesteps help the itmax_canopy_fluxes problem as hypothesized? Do we reduce or increase cost in this way?

So next task I think is to add the number of iterations to history.

slevis-lmwg commented 5 years ago

Added ITLEF to history, ran this test: ERP_D_Ld5.f10_f10_musgs.I2000Clm50Sp.cheyenne_intel.clm-decStart which uses finidat = 'clmi.I2000Clm50BgcCrop.2011-01-01.1.9x2.5_gx1v7_gl4_simyr2000_c180715.nc' and I got this history output in .clm2.h0.2002-01-03-00000.nc

20020103_ITLEF

ITLEF colors range from 2 to 8.5 iterations.

My first reaction: it's winter... Greater values appear in the tropics and Southern Hemisphere; however, some high values appear in Antarctica, where ELAI, ESAI = 0, and TLAI, TSAI are negligible! Does that seem reasonable?

20020103_TV

TV colors range 229-309K.

Next I will save history at every time step.

slevis-lmwg commented 5 years ago

Looking by time step: .clm2.h0.2002-01-01-00000.nc has max iterations 32 in a certain time step .clm2.h0.2002-01-02-00000.nc has max iterations 32 in a certain time step .clm2.h0.2002-01-03-00000.nc has max iterations 16 in a certain time step

I don't think I see a preference for more iterations at nighttime or daytime or at low sun-angles.

May help to see if certain pfts are more likely to end up with more iterations... Any similarity between the Antarctic grid cells and the tropical grid cells?

slevis-lmwg commented 5 years ago

On the Antarctic grid cells topic, I may have discovered a problem. As far as I can tell: 1) ITLEF > 0 ...e.g. ITLEF = 9 in grid cell (16,2) for PFT 12 (arctic grass) when ELAI+ESAI = 0 2) ITLEF > 0 ...e.g. ITLEF = 3 in grid cell (21,3) for PFT 14 (C4 grass) when ELAI+ESAI = 0

Questions:

Why ITLEF > 0 when ELAI+ESAI = 0?
Why does PFT 14 appear at all in this part of the world?

The following also seems potentially interesting:

ITLEFvTV_by_tstep_20020103

I see a clear convergence toward TV centered at about 300K for ITLEF > 15 or so.

PFTvTV_ITLEFgt20_by_tstep_20020103

dlawrenncar commented 5 years ago

Thanks Sam, interesting. Might be worth soon running this by Gordon and Danica to see if they have any insight.

On Mar 27, 2019, at 2:19 AM, Samuel Levis notifications@github.com wrote:

On the Antarctic grid cells topic, I may have discovered a problem. As far as I can tell:

ITLEF > 0 ...e.g. ITLEF = 9 in grid cell (16,2) for PFT 12 (arctic grass) when ELAI+ESAI = 0
ITLEF > 0 ...e.g. ITLEF = 3 in grid cell (21,3) for PFT 14 (C4 grass) when ELAI+ESAI = 0 Questions:
Why ITLEF > 0 when ELAI+ESAI = 0?
Why does PFT 14 appear at all in this part of the world?

The following also seems potentially interesting: [image: ITLEFvTV_by_tstep_20020103] https://user-images.githubusercontent.com/35322900/55043330-fc97d680-4ff2-11e9-979b-1b35f2a0a8ac.JPG [image: PFTvTV_ITLEFgt20_by_tstep_20020103] https://user-images.githubusercontent.com/35322900/55043332-00c3f400-4ff3-11e9-9a39-bf733e3fc320.JPG I see a clear convergence toward TV centered at about 300K for ITLEF > 15 or so.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/ctsm/issues/299#issuecomment-476922548, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAcVOd1V3HxyM9d0JBViRfrd2KFdAl9ks5vasckgaJpZM4SHMkq .

slevis-lmwg commented 5 years ago

Note about the last two and the next figures: I generated them by running the test ERP_D_Ld5.f10_f10_musgs.I2000Clm50Sp.cheyenne_intel.clm-decStart with output at every timestep. I plotted timestep data from the last three days of the simulation (3 x 48 = 144 data points).

So next I plot the relationship between TV and TSA to show that they are strongly related (not a surprise). I want to make sure that my earlier statement about TV converging near 300K does not get misinterpreted as suggesting that high ITLEF leads to this convergence. The other way around is more likely and, still, there are many more occurrences of TV near 300K when ITLEF is small.

TSA_v_TV_by_tstep_20020101-03

slevis-lmwg commented 5 years ago

A few other interesting relationships...

TVvFSH_ITLEFgt15_by_tstep_20020101-03

TVvFGEV_ITLEFgt15_by_tstep_20020101-03

TVvBTRAN2_ITLEFgt15_by_tstep_20020101-03

slevis-lmwg commented 5 years ago

Now plotting the effect on TV of reducing itmax_canopy_fluxes from the default 40. Skipping itmax_canopy_fluxes = 20 because very little change.

y axis itmax_canopy_fluxes = 10 x axis itmax_canopy_fluxes = 40

TVvTV_ITLEFgt9_by_tstep_20020101-03

y axis itmax_canopy_fluxes = 7 x axis itmax_canopy_fluxes = 40

TVvTV_ITLEFgt6_by_tstep_20020101-03

y axis itmax_canopy_fluxes = 4 x axis itmax_canopy_fluxes = 40

TVvTV_ITLEFgt3_by_tstep_20020101-03

y axis itmax_canopy_fluxes = 2 x axis itmax_canopy_fluxes = 40

TVvTV_ITLEFgt1_by_tstep_20020101-03

slevis-lmwg commented 5 years ago

A few more plots... I will not attempt to piece together a story-line, yet.

ITLEFvBTRAN2_by_tstep_20020101-03 ITLEFvFGEV_by_tstep_20020101-03 ITLEFvRSSUN_by_tstep_20020101-03 ITLEFvRSSHA_by_tstep_20020101-03 ITLEFvU10_by_tstep_20020101-03 FGEVvFSH_ITLEFgt15_by_tstep_20020101-03 FCEVvFSH_ITLEFgt15_by_tstep_20020101-03 FSDSVDvFSH_ITLEFgt15_by_tstep_20020101-03 ITLEFvSABG_by_tstep_20020101-03 ITLEFvSABV_by_tstep_20020101-03

slevis-lmwg commented 5 years ago

Looking at the effect of model timestep on TV and ITLEF...

dtime = 30 min

ITLEFvTV_by_tstep_20020101-03

dtime = 15 min

ITLEFvTV_by_pft_20020101-03 TVvFSH_ITLEFgt15

dtime = 5 min

ITLEFvTV TVvFSH_ITLEFgt15 PFTvTV_ITLEFgt15

slevis-lmwg commented 5 years ago

I was expecting a shift toward smaller ITLEF with smaller model time step. We do not seem to get such shift.

dlawrenncar commented 5 years ago

These are all very interested plots. Question: Have you assessed the performance (speed) of the model if the maximum ITLEF is reduced, irrespective of the quality of the solution. I think we have examined this, but would be good to consider what the potential impact would be in terms of model speed if ITLEF could be reduced while maintaining a good solution. Just asking because if the impact is small, then we should be careful to think about how much we pursue this question. That said, the results you are getting is pretty interesting in it's own right so might be worth pursuing under any set of circumstances.

On Tue, Apr 2, 2019 at 5:32 PM Samuel Levis notifications@github.com wrote:

I was expecting a shift toward smaller ITLEF with smaller model time step. We do not seem to get such shift.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/ctsm/issues/299#issuecomment-479254122, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAcVOoOMtryH8oBBptNg5CXuiQ_hAR5ks5vc-h2gaJpZM4SHMkq .

slevis-lmwg commented 5 years ago

@dlawrenncar , here's what I found for canopy fluxes in the cesm_timing_stats files...

itmax on processes threads count walltotal wallmax (proc thrd) wallmin (proc thrd)

40 - 72 72 1.7352e+04 9.445482e+01 2.032 ( 21 0) 0.841 ( 50 0) 10 - 72 72 1.7352e+04 9.240528e+01 2.005 ( 21 0) 0.829 ( 50 0) 7 - 72 72 1.7352e+04 9.027068e+01 1.918 ( 21 0) 0.815 ( 50 0) 4 - 72 72 1.7352e+04 7.997628e+01 1.690 ( 27 0) 0.729 ( 50 0) 2 - 72 72 1.7352e+04 6.174658e+01 1.263 ( 14 0) 0.579 ( 56 0)

slevis-lmwg commented 5 years ago

So walltotal gain in canopy fluxes from itmax 40 to... 10 is 2% 7 is 4% 4 is 15% 2 is 35%

barlage commented 5 years ago

@slevisconsulting A little surprised at your time step tests, at least from visual inspection. In your ITLEF vs. TV plots, is it possible to sum the number of elements in each of the ITLEF bins? e.g. how many had 2 iterations? how many had 3 iterations? is there a shift in the distribution for the time step options? if not, then maybe this is a dead end.

slevis-lmwg commented 5 years ago

ITLEF_freq

where n is the number of finite samples, calculated as the product of these numbers (total samples): days = 3 tsteps = 48 or 96 or 288 latitudes = 19 longitudes = 24 pfts = 16 minus the number of NaN samples.

ekluzek commented 5 years ago

Sam you might need to normalize that last plot, so you can tell the difference in the relative frequency between the different time-steps. Visually with this, it looks like there isn't any difference -- which is a bit surprising. I'd think to get a shift to lower iterations for the faster time-step. You'd think there'd be a point where if the time-step is fast enough, you would only need one iteration.

The time-steps are for running the entire model at those time-steps right?

slevis-lmwg commented 5 years ago

@barlage , thinking about it this way makes me less surprised: The frequency of atm data prob. ends up being the limiting factor to improvement when I shorten the ctsm time step. In every time step, the initial conditions used for starting the canopy fluxes iteration are:

Ts = (Tg + Theta_atm) / 2 qs = (qg + q_atm) / 2

where Ts and qs are the canopy air space temperature and sp. humidity, Tg and qg are the ground temperature and sp. humidity, Theta_atm and q_atm are the potential temperature and sp. humidity at the reference height.

slevis-lmwg commented 5 years ago

@ekluzek , I normalized the curves according to each one's sample size to get the frequency per sample. Let me know if that's not what you're suggesting.

ITLEF_norm_freq

ekluzek commented 5 years ago

That's exactly what I was suggesting, and it makes it more clear. The faster time-steps barely show a lower frequency of larger ITLEF at all -- really makes little difference.

slevis-lmwg commented 5 years ago

The same figure, but zoomed in at higher ITLEF, where the rel. frequencies are tiny. Shows that the rel. frequency of ITLEF = 40 hasn't changed:

ITLEF_norm_freq_zoomed_in

slevis-lmwg commented 5 years ago

Here's a thought for everyone to comment on:

As I mentioned earlier, every time step as far as I can tell initializes the iterative solver with Ts = (Tg + Theta_atm) / 2 qs = (qg + q_atm) / 2

Partway through the solver, Ts is updated to be a function of Tg, Theta_atm, and Tv. Similarly qs is updated to be a function of also Tv.

Might we reduce the number of solver iterations if we stop re-initializing Ts and qs at every time step? Has this been tried?

A caveat about this change: In transient simulations where pfts may appear/disappear/reappear, the initial conditions upon reappearing could end up being out of date. Unless I'm forgetting how transient works because maybe calculations continue to be performed. Sorry I don't remember.

slevis-lmwg commented 5 years ago

I tried it. It appears to help at the low end of ITLEF...

ITLEF_norm_freq

Not much difference at the high end...

ITLEF_norm_freq_zoomed_in

New TV (y axis) relative to the default (x axis)

TV_case2_v_TV_case1_ITLEFgt1

slevis-lmwg commented 5 years ago

Here's something else that seems to help at the low end and not at the high end of ITLEF:

One of the criteria for convergence is... ∆𝑇𝑣 < 0.01K where ∆𝑇𝑣 =max (︀⃒⃒𝑇(𝑛+1)𝑣 − 𝑇(𝑛)𝑣⃒⃒ , ⃒⃒𝑇(𝑛)𝑣 − 𝑇(𝑛−1)𝑣⃒⃒)

In this experiment I have removed the second term in the max statement (variable del2 in the code) with the idea that we may be encouraging unstable behavior in Tv by allowing timestep (n-1) to influence the outcome at (n+1)... So ∆𝑇𝑣 now is: ∆𝑇𝑣 =(︀⃒⃒𝑇(𝑛+1)𝑣 − 𝑇(𝑛)𝑣⃒⃒ )

Correction: (n+1), n, (n-1) are iteration numbers, not time steps.

ITLEF_norm_freq ITLEF_norm_freq_zoomed_in TV_case2_v_TV_case1_ITLEFgt1

In case it got missed earlier, I repeat that all the results that I have shown are from 5-day simulations.

slevis-lmwg commented 5 years ago

Now something that helped more at the high end than the low end of ITLEF, so I don't even show the low end plot:

The algorithm constrains 𝑇𝑣 to change by no more than 1K per iteration. I modified the constraint to 10K and got (look all the way right at ITLEF = 40)...

ITLEF_norm_freq_zoomed_in TV_case2_v_TV_case1_ITLEFgt1

Timing of the last three tests added here to the bottom of the table that I showed before itmax walltotal wallmax (proc thrd) wallmin (proc thrd) %WALLTOTAL GAIN cases 40 9.445482e+01 2.032 (21 0) 0.841 (50 0) 0% DEFAULT 10 9.240528e+01 2.005 (21 0) 0.829 (50 0) 2% gain itmax=10 7 9.027068e+01 1.918 (21 0) 0.815 (50 0) 4% gain itmax=7 4 7.997628e+01 1.690 (27 0) 0.729 (50 0) 15% gain itmax=4 2 6.174658e+01 1.263 (14 0) 0.579 (56 0) 35% gain itmax=2 40 9.015737e+01 1.977 (21 0) 0.802 (63 0) 5% gain no reinitializing Ts qs 40 8.677573e+01 1.821 (21 0) 0.781 (63 0) 8% gain ...and no del2 40 8.726323e+01 1.747 (21 0) 0.818 (63 0) 8% gain ...and 10 x delmax

I thought the last test would contribute a bigger walltotal gain because it affected high ITLEFs; however, high ITLEFs appear with low relative frequency and low ITLEFs appear with high relative frequency, so affecting low ITLEFs results in greater walltotal gain. That seems reasonable to me now.

slevis-lmwg commented 5 years ago

In BareGroundFluxes I don't think that this is the case because I see integer, parameter :: niters = 3 ! maximum number of iterations for surface temperature

The same issue could be causing BareGroundFluxes to take a long time.

slevis-lmwg commented 5 years ago

My presentation at today's CLM meeting of the work this far: https://docs.google.com/presentation/d/1dN8u3NRdv9rOfYvsOp2ukygj8-C-5f-xEEfxaRTVWwU/edit?usp=sharing

Suggestions for follow-up work:

@dlawrenncar asked for timing gains from the tests shown above. I have added to the notes above (2 posts up from here).
@swensosc will share info about single-point cases where he found high ITLEF
If such info doesn't help, I could try a 1-deg run to identify where I might want to run single-point
Gordon suggested that I look for large oscillations in FSH and FLH from timestep to timestep in places with high ITLEF
Gordon suggested substepping, i.e. shortening the timestep within canopy fluxes to, say, 5 minutes rather than shortening the whole-model timestep. He said that it may be complicated and not trivial to implement in the current setup but would eliminate the need for an iterative solver. Dave suggested that we look into it.
As a simple way to test Gordon's substepping idea, what if I re-tested the shorter whole-model timesteps with (1) the code-mods above that lowered ITLEF and (2) with ITLEFmax = ITLEFmin? I think that a 5-min whole-model timestep will exercise what Gordon has in mind with substepping, even if for now this wastes resources elsewhere in the model. Oh, BUT is the outcome of CanopyFluxes influenced by other iterative solvers elsewhere in the model for which I might need to rethink the initial conditions the way I did for this one?
Gordon doesn't remember why we use two iteration differences here... ∆𝑇𝑣 =max (︀⃒⃒𝑇(𝑛+1)𝑣 − 𝑇(𝑛)𝑣⃒⃒ , ⃒⃒𝑇(𝑛)𝑣 − 𝑇(𝑛−1)𝑣⃒⃒) and isn't attached to keeping this approach.
@swensosc suggested two more tests: (1) zetamaxstable = 2d00 and (2) zetamaxstable = 100d00

slevis-lmwg commented 5 years ago

zetamaxstable = 2 and 100 made things worse compared to the default (= 0.5). In these figures, "just" means "this is the only change from the default case" and "same" means that changes were applied cumulatively.

ITLEF_norm_freq ITLEF_norm_freq_zoomed_in

swensosc commented 5 years ago

That's what I would have expected. We changed the value of zetamax to keep the model from becoming too stable. It would be interesting to see the difference between day/night for the z=100 case.

On Tue, Apr 30, 2019 at 2:37 PM Samuel Levis notifications@github.com wrote:

zetamaxstable = 2 and 100 made things worse compared to the default. In these figures, "just" means "this is the only change" and "same" means that changes were applied cumulatively.

[image: ITLEF_norm_freq] https://user-images.githubusercontent.com/35322900/56991586-dcf15200-6b4c-11e9-8eb1-d8aa59a4cb5c.JPG [image: ITLEF_norm_freq_zoomed_in] https://user-images.githubusercontent.com/35322900/56991587-dcf15200-6b4c-11e9-80bc-7124c1a71dc8.JPG

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/ctsm/issues/299#issuecomment-488105733, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRN57HSSEQXBARVPV6TYR3PTCUWJANCNFSM4EQ4ZEVA .

slevis-lmwg commented 5 years ago

Switching to single-point simulation 1x1pt_NR1 (Niwot Ridge) using input data provided by @swensosc. Ran two years and plotted every timestep from year 2. There are 16 cases of ITLEF = 40: ITLEF_yr2

Plotting ITLEF and FSH together and zooming in.

i) 7 of the 16 cases of ITLEF = 40 occur before or as FSH begins to rise for the day on days 149, 189, 191, 196, 203, 234, 327, so spanning May to November, though 5 of 7 occur in summer. ii) 5 of the 16 cases of ITLEF = 40 occur as FSH stops dropping at the end of the day on days 31, 208, 215, 215, 248, so 4 of 5 in summer. iii) 3 of the 16 cases of ITLEF = 40 occur as FSH bottoms out during wild oscillations on days 175, 214, 242, all in summer.

Plots:

A representative example of (i) showing that dFSH/dt could be about zero or very positive
Examples of (ii) and (iii) in one plot:

As far as I can tell, the common element from these plots (summed up in the next one) is that FSH is near zero (0 W/m2) and not that FSH is oscillating wildly.

FSH_v_ITLEF

But something else might be going on at the same time because not all FSH near zero result in ITLEF = 40.

Fast rising or fast dropping FLH = FCEV+FGEV+FCTR coincides with ITLEF = 40 almost every time
Not necessarily wildly oscillating FLH
FGEV appears to bottom out for most ITLEF = 40
FCEV is bouncing around for some ITLEF = 40 and is zero for others
U10 bottoms out for almost all ITLEF = 40
BTRAN2 > 0.94 for all ITLEF = 40
RSSHA bottoms out for most ITLEF = 40 and RSSUN tends to be low
FSDS tends to be low
I like @bishtgautam's suggestion (two posts down) as a follow-up because I see correlations but not a clear cause and effect; possibly try in conjunction with the mods above that I found reduced ITLEF

slevis-lmwg commented 5 years ago

@billsacks @barlage In the context of the NWP configuration, where ITLEFmax = 3, I believe that the following change (quoting from an earlier post) would be beneficial:

[...] every time step [...] initializes the iterative solver with Ts = (Tg + Theta_atm) / 2 qs = (qg + q_atm) / 2

[...] reduce the number of solver iterations if we stop re-initializing Ts and qs at every time step

bishtgautam commented 5 years ago

In addition to the suggestions mentioned in the comment above, one more suggestion would be to focus only a single timestep for which ITLEF is reaching ITLEFmax. Then, try to understand why the solution isn't converging fast enough. Is the solution oscillating between iterations? Or, just converging too slowly. If possible, it would be nice to plot equation (2.5.128) as a function of Tv and check if that equation is a smooth function of Tv.

slevis-lmwg commented 5 years ago

Next plots from single-point simulation no_reinit_NR1 Same as 1x1_NR1 but stopped reinitializing Ts and qs at every timestep.

ITLEF = 40 cases in yr 2 remain ~1 per 1000 though they increased from 16 to 21 out of 17520 timesteps
yr 2 mean(ITLEF) dropped from 4.5 to 4.2

I restarted the simulation in the middle of year 2 to write out the progression of variables taking part in the iterative solution of Tv. For example...

dele_and_det_d9

This shows the values of dele and det over the course of the 40 iterations in a time step. Convergence would have occurred only if dele < 0.1 W/m2 and det < 0.01 K. "Day 9" in the x-label refers to the 9th day since the restart on July 1st.

The graph is characteristic of most cases that never meet both criteria in this second half of yr 2:

dele and det drop quickly in the first few iterations, then slow down, flatten out, or even rise a bit
in most cases the last 30 iterations see little if any change in det and/or dele toward convergence

In a few cases...

the det criterion is met and the dele criterion is not met (1st figure below)
det and/or dele oscillate once or twice before declining (1st figure below)
minimum det and/or dele come nowhere close to the goal (2nd figure below)
det and/or dele asymptote higher than where they started (3rd figure below)

dele_and_det_d14 dele_and_det_d33b dele_and_det_d27

slevis-lmwg commented 5 years ago

Same simulation as previous post but plotting Tv and Eq. (2.5.128) as suggested by @bishtgautam

The figures show that Eq. (2.5.128) oscillates around 0 when Tv doesn't converge and Eq. (2.5.128) gets close to zero when Tv converges: tv_and_eq2 5 128_d9 tv_and_eq2 5 128_d14

bishtgautam commented 5 years ago

For the timestep that corresponds to "part of Day 9 Year 2" mentioned in this comment, I would guess either eq. (2.5.128) has a discontinuity (in value or in slope) or the approximation of the derivative of eq. (2.5.128) is not accurate enough (Text below eq. (2.5.132) describes certain partial derivative terms are neglected).

You may consider hacking the iteration loop at CanopyFluxesMod.F90#L804 by ignoring the t_veg iterate values provided by Newton-Raphson method and let t_veg go from 284.23K to 284.18K with a very fine dt_veg. Then, write values for eq. (2.5.128) and denominator of eq. (2.5.129). Finally, make plots of the two output values against t_veg.

slevis-lmwg commented 5 years ago

Thank you @bishtgautam. Let me make sure I didn't misunderstand your suggestion:

I ran the model up to the time step before the time step of interest to generate a restart file
I restarted and ran the time step that was reaching ITLEF = 40 in "day 9" but now with the following modifications:
Initial taf = 284.23 which is canopy air temperature and one of two variables used to initialize the iterative solver
dt_veg = -0.011 where I set this greater than the 0.01 limit to avoid exiting the iterative solver if we happened to meet the other convergence criterion; were you suggesting a finer dt_veg?
I had to turn off error checking in BalanceCheckMod.F90 to prevent the model from stopping from an energy balance error
I wrote out the denominator of Eq. 2.5.129 as (- 4._r8*bir(p)*t_veg(p)**3 +dc1*wtga +dc2*wtgaq*qsatldT(p)) and I wrote out Eq. 2.5.128 (this time and previously) as (sabv(p) + air(p) + bir(p)*t_veg(p)**4 + cir(p)*lw_grnd - efsh - efe(p))

tv_and_eq2 5 129 denom_d9_wMODS tv_and_eq2 5 128_d9_wMODS

bishtgautam commented 5 years ago

@slevisconsulting, Thanks for pursuing my suggestions and the plots. Here are few answers/comments

I ran the model up to the time step before the time step of interest to generate a restart file

That is exactly the way to go about.

dt_veg = -0.011 where I set this greater than the 0.01 limit to avoid exiting the iterative solver if we happened to meet the other convergence criterion; were you suggesting a finer dt_veg?

The "part of Day 9 Year 2" plots mentioned in this comment showed eq. (2.5.128) is between 4 and -1 [Wm^{-2}] for Tv between 284.23 and 284.18K. So, my suggestion was to pick a dt_veg such that between the iteration loop you would traverse the Tv range that allows eq. (2.5.128) to go from positive to negative values (i.e. dt_veg = (284.18-283.23)/40 ~ 0.02).

Do the plots in the above comment exactly correspond to the "part of Day 9 Year 2" plots mentioned in this comment? I ask because the values of eq. (2.5.128) are significantly different in the two comments.

I had to turn off error checking in BalanceCheckMod.F90 to prevent the model from stopping from an energy balance error

You can leave the BalanceCheck on and just let the model crash because you are analyzing a single timestep.

As a side, plots of Tv vs eq. (2.5.128) and Tv vs eq. (2.5.128denom) would be more intuitive to figure out any discontinuity.

slevis-lmwg commented 5 years ago

First things first: I'm replacing the first of the two figures in the last post due to an error.

I also noticed that the values of eq. (2.5.128) and of Tv were much different, so I did two things:

1) I restarted without the code changes and confirmed bit-for-bit same answers as from the continuous run without the restart. This confirmed that I ran the test for the correct time step that reached ITLEF = 40.

2) I restarted the same time step once more and this time left taf with its restart value instead of initializing it to 284.23. Also I set dt_veg = -0.00075 similar to your suggestion but using instead (284.19-284.22)/40. The result is similar in that eq. (2.5.128) is about 60 W/m2 and Tv is about 282 K. This case crashes at ITLEF = 2 due to a balance error, so I did need to remove the call endrun...

slevis-lmwg commented 5 years ago

Here are the Tv versus Eq. 128 and Eq. 129_denom plots for the same case as the two previous plots:

tv_v_eq2 5 128_d9_wMODS tv_v_eq2 5 129denom_d9_wMODS

slevis-lmwg commented 5 years ago

If Gordon's statement about oscillating FSH & FLH from time step to time step really meant from iteration to iteration leading to ITLEF = 40, then I did find such behavior. Looking for the source of the oscillations that prevent det and dele from converging, has led me to the circular dependencies between temperature/humidity and latent/sensible heat fluxes. Investigating with some of the above suggestions may eventually uncover something helpful, but I'm not sure how deep to search in these circular dependencies, due to their complexity.

The single-point simulations at Niwot Ridge (NR1) have shown that the changes that I suggested in earlier posts offer no improvement in numbers of ITLEF = 40 so may not be worth pursuing. ITLEF = 40 appeared in about 1 of every 1000 time steps in all my tests.

The NR1 simulations with dtime = 5 minutes and 15 minutes also showed no improvement in numbers of ITLEF = 40 (again about 1/1000 time steps). This time I confirmed my earlier hypothesis that the atmospheric driver updates every 30 minutes, so the larger number of model time steps is constrained by repeating atm. conditions. I think this evaluates the substepping suggestion (correct me if I'm wrong).

In most ITLEF = 40 cases, the last 30 iterations see little if any change in det and/or dele toward convergence. This leads me to suggest the following "fix":

Currently in ITLEF = 40 cases, the solver oscillates about the solution all the way to ITLEF = 40, at which point it leaves us with the most recent Tv and qv. Instead of this I suggest that at ITLEF > 15 (?) we force det and dele to change just enough to pass the convergence test and adjust all affected terms to maintain energy balance. I will post an update in the next day or two.

slevis-lmwg commented 5 years ago

I ran a new case with code modifications that enforce convergence. The code mods apply if convergence criteria are not met for ITLEF > 15. Convergence occurs by ITLEF = 17 due to ∆𝑇𝑣 = max (︀⃒⃒𝑇(𝑛+1)𝑣 − 𝑇(𝑛)𝑣⃒⃒ , ⃒⃒𝑇(𝑛)𝑣 − 𝑇(𝑛−1)𝑣⃒⃒)

The plot shows the new Tv relative to the default Tv solutions.

TVartif_conv_at17_v_TVdefault

This plot shows the same but setting ITLEFmax = 17 in the namelist instead of enforcing convergence.

TVitlefmax17_v_TVdefault

Not very different. No gains in mean(ITLEF) or number of cases of ITLEF > 15.

The convergence criteria focus on the change in leaf temperature and leaf latent heat flux, so this "fix" moves these two oscillating variables closer to the center without helping Eq. 128 get closer to zero: tv_and_efe_artif_conv_at17 tv_and_eq128_artif_conv_at17

slevis-lmwg commented 5 years ago

Now back to the run without the fix... Same "Day-9" example as before. Here are the variables that make up Eq. 128 and Eq. 129 to show how they oscillate:

tv_and_efe_d9 tv_and_efsh_d9 tv_and_dc1_d9 tv_and_dc2_d9 tv_and_qsatldT_d9 tv_and_wtga_d9 tv_and_wtgaq_d9

These terms are constant: sabv ~= 126.6 lw_grnd ~= 6.4e9 air ~= 285.3 bir ~= 0 cir ~= 0

slevis-lmwg commented 5 years ago

@swensosc and I have worked on this issue for a little while:

@swensosc shared with me code that he had used in the past to loop t_veg values over the CanopyFluxes equations instead of iteratively solving for t_veg. I believe the concept is the same as what @bishtgautam recommended a few months back.

Unfortunately we didn't learn something definitive about the lack of convergence in the iterative solution:

a) @swensosc suspects the missing terms mentioned in the Technote Section 5.3.2: "The partial derivatives 𝜕𝑟𝑎ℎ/𝜕𝑇𝑣 and 𝜕𝑟𝑎𝑤/𝜕𝑇𝑣, which cannot be determined analytically, are ignored for 𝜕𝐻𝑣/𝜕𝑇𝑣 and 𝜕𝜆𝐸𝑣/𝜕𝑇𝑣." Sean does not think it's a good use of my time to pursue a solution to these because they are messy.

b) @swensosc also pointed me to the immediate next line in the Technote: "...if 𝜁 changes sign more than four [sic] times during the temperature iteration, 𝜁 = −0.01. This helps prevent 'flip-flopping'..." I used [sic] to indicate that I think "four" should say "three" because the code says: if (nmozsgn(p) >= 4) obu(p) = zldis(p)/(-0.01_r8)

So @swensosc suggested that we try exiting the solver instead of setting 𝜁 = −0.01. I coded this as: if (nmozsgn(p) >= 4) exit ITERATION

Unfortunately, I found no benefit because there was little correlation between ITLEF == 40 (36 instances in 2 yrs at Niwot Ridge) and 𝜁 changing sign more than three times (only 1 instance corresponded to an ITLEF == 40).

ESCOMP / CTSM

Address large number of iterations done for canopy fluxes #299