Open billsacks opened 6 years ago
@barlage do you want to add your recent findings to this issue?
@swensosc changed the number of canopy iterations from 40 to 2. This gave close to a factor of 2 difference in the timing of canflux / can_iter. Note that most of the time spent there is in photosynthesis.
We felt it could be worth looking at maps of how many iterations it takes in various places, at various times of day.
@dlawrenncar points out that the fundamental problem may be something like that the answer is bouncing around rather than actually converging. The first step here is to better characterize what's actually happening.
Notes from my Feb 2019 visit to Boulder:
Long term solution In Gordon's view will require redesign of this part of the code and could be major undertaking.
Short term solution Gordon and Mike B believe that a shorter clm timestep may reduce the iterations required. Mike suggested investigating the timestep-to-iteration relationship.
Bill will add iterations setting to the namelist, so wait for that.
Evaluate how much to reduce max number of iterations. Consider averaging when solution bounces around?
@billsacks , at our meeting in Boulder you suggested that I wait for you to add an iterations setting to the namelist before I started working on this issue. Is that still the plan or should I go ahead and get started?
@billsacks , alternative to my previous question: Would you like me to add the iterations setting to the namelist if you haven't already?
@slevisconsulting I hope to finally start the work related to this next week, though with other obligations and some vacation time coming up, it might still be a couple of weeks before this is on master. So you shouldn't wait for me, but if we both add this namelist item, it will take some reconciliation. I forget: do you need this namelist item in order to start your work on this? (I forget the details of our earlier discussion.)
I've seen a couple of suggestions for what to look for during this investigation:
So next task I think is to add the number of iterations to history.
Added ITLEF to history, ran this test:
ERP_D_Ld5.f10_f10_musgs.I2000Clm50Sp.cheyenne_intel.clm-decStart
which uses
finidat = 'clmi.I2000Clm50BgcCrop.2011-01-01.1.9x2.5_gx1v7_gl4_simyr2000_c180715.nc'
and I got this history output in
ITLEF colors range from 2 to 8.5 iterations.
My first reaction: it's winter... Greater values appear in the tropics and Southern Hemisphere; however, some high values appear in Antarctica, where ELAI, ESAI = 0, and TLAI, TSAI are negligible! Does that seem reasonable?
TV colors range 229-309K.
Next I will save history at every time step.
Looking by time step: .clm2.h0.2002-01-01-00000.nc has max iterations 32 in a certain time step .clm2.h0.2002-01-02-00000.nc has max iterations 32 in a certain time step .clm2.h0.2002-01-03-00000.nc has max iterations 16 in a certain time step
I don't think I see a preference for more iterations at nighttime or daytime or at low sun-angles.
May help to see if certain pfts are more likely to end up with more iterations... Any similarity between the Antarctic grid cells and the tropical grid cells?
On the Antarctic grid cells topic, I may have discovered a problem. As far as I can tell: 1) ITLEF > 0 ...e.g. ITLEF = 9 in grid cell (16,2) for PFT 12 (arctic grass) when ELAI+ESAI = 0 2) ITLEF > 0 ...e.g. ITLEF = 3 in grid cell (21,3) for PFT 14 (C4 grass) when ELAI+ESAI = 0
Questions:
The following also seems potentially interesting:
I see a clear convergence toward TV centered at about 300K for ITLEF > 15 or so.
Thanks Sam, interesting. Might be worth soon running this by Gordon and Danica to see if they have any insight.
On Mar 27, 2019, at 2:19 AM, Samuel Levis notifications@github.com wrote:
On the Antarctic grid cells topic, I may have discovered a problem. As far as I can tell:
The following also seems potentially interesting: [image: ITLEFvTV_by_tstep_20020103] https://user-images.githubusercontent.com/35322900/55043330-fc97d680-4ff2-11e9-979b-1b35f2a0a8ac.JPG [image: PFTvTV_ITLEFgt20_by_tstep_20020103] https://user-images.githubusercontent.com/35322900/55043332-00c3f400-4ff3-11e9-9a39-bf733e3fc320.JPG I see a clear convergence toward TV centered at about 300K for ITLEF > 15 or so.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/ctsm/issues/299#issuecomment-476922548, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAcVOd1V3HxyM9d0JBViRfrd2KFdAl9ks5vasckgaJpZM4SHMkq .
Note about the last two and the next figures: I generated them by running the test ERP_D_Ld5.f10_f10_musgs.I2000Clm50Sp.cheyenne_intel.clm-decStart with output at every timestep. I plotted timestep data from the last three days of the simulation (3 x 48 = 144 data points).
So next I plot the relationship between TV and TSA to show that they are strongly related (not a surprise). I want to make sure that my earlier statement about TV converging near 300K does not get misinterpreted as suggesting that high ITLEF leads to this convergence. The other way around is more likely and, still, there are many more occurrences of TV near 300K when ITLEF is small.
A few other interesting relationships...
Now plotting the effect on TV of reducing itmax_canopy_fluxes from the default 40. Skipping itmax_canopy_fluxes = 20 because very little change.
y axis itmax_canopy_fluxes = 10 x axis itmax_canopy_fluxes = 40
y axis itmax_canopy_fluxes = 7 x axis itmax_canopy_fluxes = 40
y axis itmax_canopy_fluxes = 4 x axis itmax_canopy_fluxes = 40
y axis itmax_canopy_fluxes = 2 x axis itmax_canopy_fluxes = 40
A few more plots... I will not attempt to piece together a story-line, yet.
Looking at the effect of model timestep on TV and ITLEF...
dtime = 30 min
dtime = 15 min
dtime = 5 min
I was expecting a shift toward smaller ITLEF with smaller model time step. We do not seem to get such shift.
These are all very interested plots. Question: Have you assessed the performance (speed) of the model if the maximum ITLEF is reduced, irrespective of the quality of the solution. I think we have examined this, but would be good to consider what the potential impact would be in terms of model speed if ITLEF could be reduced while maintaining a good solution. Just asking because if the impact is small, then we should be careful to think about how much we pursue this question. That said, the results you are getting is pretty interesting in it's own right so might be worth pursuing under any set of circumstances.
On Tue, Apr 2, 2019 at 5:32 PM Samuel Levis notifications@github.com wrote:
I was expecting a shift toward smaller ITLEF with smaller model time step. We do not seem to get such shift.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/ctsm/issues/299#issuecomment-479254122, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAcVOoOMtryH8oBBptNg5CXuiQ_hAR5ks5vc-h2gaJpZM4SHMkq .
@dlawrenncar , here's what I found for canopy fluxes in the cesm_timing_stats files...
itmax on processes threads count walltotal wallmax (proc thrd) wallmin (proc thrd)
40 - 72 72 1.7352e+04 9.445482e+01 2.032 ( 21 0) 0.841 ( 50 0) 10 - 72 72 1.7352e+04 9.240528e+01 2.005 ( 21 0) 0.829 ( 50 0) 7 - 72 72 1.7352e+04 9.027068e+01 1.918 ( 21 0) 0.815 ( 50 0) 4 - 72 72 1.7352e+04 7.997628e+01 1.690 ( 27 0) 0.729 ( 50 0) 2 - 72 72 1.7352e+04 6.174658e+01 1.263 ( 14 0) 0.579 ( 56 0)
So walltotal gain in canopy fluxes from itmax 40 to... 10 is 2% 7 is 4% 4 is 15% 2 is 35%
@slevisconsulting A little surprised at your time step tests, at least from visual inspection. In your ITLEF vs. TV plots, is it possible to sum the number of elements in each of the ITLEF bins? e.g. how many had 2 iterations? how many had 3 iterations? is there a shift in the distribution for the time step options? if not, then maybe this is a dead end.
where n is the number of finite samples, calculated as the product of these numbers (total samples): days = 3 tsteps = 48 or 96 or 288 latitudes = 19 longitudes = 24 pfts = 16 minus the number of NaN samples.
Sam you might need to normalize that last plot, so you can tell the difference in the relative frequency between the different time-steps. Visually with this, it looks like there isn't any difference -- which is a bit surprising. I'd think to get a shift to lower iterations for the faster time-step. You'd think there'd be a point where if the time-step is fast enough, you would only need one iteration.
The time-steps are for running the entire model at those time-steps right?
@barlage , thinking about it this way makes me less surprised: The frequency of atm data prob. ends up being the limiting factor to improvement when I shorten the ctsm time step. In every time step, the initial conditions used for starting the canopy fluxes iteration are:
Ts = (Tg + Theta_atm) / 2
qs = (qg + q_atm) / 2
where Ts and qs are the canopy air space temperature and sp. humidity, Tg and qg are the ground temperature and sp. humidity, Theta_atm and q_atm are the potential temperature and sp. humidity at the reference height.
@ekluzek , I normalized the curves according to each one's sample size to get the frequency per sample. Let me know if that's not what you're suggesting.
That's exactly what I was suggesting, and it makes it more clear. The faster time-steps barely show a lower frequency of larger ITLEF at all -- really makes little difference.
The same figure, but zoomed in at higher ITLEF, where the rel. frequencies are tiny. Shows that the rel. frequency of ITLEF = 40 hasn't changed:
Here's a thought for everyone to comment on:
As I mentioned earlier, every time step as far as I can tell initializes the iterative solver with
Ts = (Tg + Theta_atm) / 2
qs = (qg + q_atm) / 2
Partway through the solver, Ts is updated to be a function of Tg, Theta_atm, and Tv. Similarly qs is updated to be a function of also Tv.
Might we reduce the number of solver iterations if we stop re-initializing Ts and qs at every time step? Has this been tried?
A caveat about this change: In transient simulations where pfts may appear/disappear/reappear, the initial conditions upon reappearing could end up being out of date. Unless I'm forgetting how transient works because maybe calculations continue to be performed. Sorry I don't remember.
I tried it. It appears to help at the low end of ITLEF...
Not much difference at the high end...
New TV (y axis) relative to the default (x axis)
Here's something else that seems to help at the low end and not at the high end of ITLEF:
One of the criteria for convergence is... βππ£ < 0.01K where βππ£ =max (οΈββπ(π+1)π£ β π(π)π£ββ , ββπ(π)π£ β π(πβ1)π£ββ)
In this experiment I have removed the second term in the max statement (variable del2 in the code) with the idea that we may be encouraging unstable behavior in Tv by allowing timestep (n-1) to influence the outcome at (n+1)... So βππ£ now is: βππ£ =(οΈββπ(π+1)π£ β π(π)π£ββ )
Correction: (n+1), n, (n-1) are iteration numbers, not time steps.
In case it got missed earlier, I repeat that all the results that I have shown are from 5-day simulations.
Now something that helped more at the high end than the low end of ITLEF, so I don't even show the low end plot:
The algorithm constrains ππ£ to change by no more than 1K per iteration. I modified the constraint to 10K and got (look all the way right at ITLEF = 40)...
Timing of the last three tests added here to the bottom of the table that I showed before itmax walltotal wallmax (proc thrd) wallmin (proc thrd) %WALLTOTAL GAIN cases 40 9.445482e+01 2.032 (21 0) 0.841 (50 0) 0% DEFAULT 10 9.240528e+01 2.005 (21 0) 0.829 (50 0) 2% gain itmax=10 7 9.027068e+01 1.918 (21 0) 0.815 (50 0) 4% gain itmax=7 4 7.997628e+01 1.690 (27 0) 0.729 (50 0) 15% gain itmax=4 2 6.174658e+01 1.263 (14 0) 0.579 (56 0) 35% gain itmax=2 40 9.015737e+01 1.977 (21 0) 0.802 (63 0) 5% gain no reinitializing Ts qs 40 8.677573e+01 1.821 (21 0) 0.781 (63 0) 8% gain ...and no del2 40 8.726323e+01 1.747 (21 0) 0.818 (63 0) 8% gain ...and 10 x delmax
I thought the last test would contribute a bigger walltotal gain because it affected high ITLEFs; however, high ITLEFs appear with low relative frequency and low ITLEFs appear with high relative frequency, so affecting low ITLEFs results in greater walltotal gain. That seems reasonable to me now.
In BareGroundFluxes I don't think that this is the case because I see
integer, parameter :: niters = 3 ! maximum number of iterations for surface temperature
The same issue could be causing BareGroundFluxes to take a long time.
My presentation at today's CLM meeting of the work this far: https://docs.google.com/presentation/d/1dN8u3NRdv9rOfYvsOp2ukygj8-C-5f-xEEfxaRTVWwU/edit?usp=sharing
Suggestions for follow-up work:
zetamaxstable = 2 and 100 made things worse compared to the default (= 0.5). In these figures, "just" means "this is the only change from the default case" and "same" means that changes were applied cumulatively.
That's what I would have expected. We changed the value of zetamax to keep the model from becoming too stable. It would be interesting to see the difference between day/night for the z=100 case.
On Tue, Apr 30, 2019 at 2:37 PM Samuel Levis notifications@github.com wrote:
zetamaxstable = 2 and 100 made things worse compared to the default. In these figures, "just" means "this is the only change" and "same" means that changes were applied cumulatively.
[image: ITLEF_norm_freq] https://user-images.githubusercontent.com/35322900/56991586-dcf15200-6b4c-11e9-8eb1-d8aa59a4cb5c.JPG [image: ITLEF_norm_freq_zoomed_in] https://user-images.githubusercontent.com/35322900/56991587-dcf15200-6b4c-11e9-80bc-7124c1a71dc8.JPG
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESCOMP/ctsm/issues/299#issuecomment-488105733, or mute the thread https://github.com/notifications/unsubscribe-auth/AGRN57HSSEQXBARVPV6TYR3PTCUWJANCNFSM4EQ4ZEVA .
Switching to single-point simulation 1x1pt_NR1 (Niwot Ridge) using input data provided by @swensosc. Ran two years and plotted every timestep from year 2. There are 16 cases of ITLEF = 40:
Plotting ITLEF and FSH together and zooming in.
i) 7 of the 16 cases of ITLEF = 40 occur before or as FSH begins to rise for the day on days 149, 189, 191, 196, 203, 234, 327, so spanning May to November, though 5 of 7 occur in summer. ii) 5 of the 16 cases of ITLEF = 40 occur as FSH stops dropping at the end of the day on days 31, 208, 215, 215, 248, so 4 of 5 in summer. iii) 3 of the 16 cases of ITLEF = 40 occur as FSH bottoms out during wild oscillations on days 175, 214, 242, all in summer.
Plots:
As far as I can tell, the common element from these plots (summed up in the next one) is that FSH is near zero (0 W/m2) and not that FSH is oscillating wildly.
But something else might be going on at the same time because not all FSH near zero result in ITLEF = 40.
@billsacks @barlage In the context of the NWP configuration, where ITLEFmax = 3, I believe that the following change (quoting from an earlier post) would be beneficial:
[...] every time step [...] initializes the iterative solver with
Ts = (Tg + Theta_atm) / 2
qs = (qg + q_atm) / 2
[...] reduce the number of solver iterations if we stop re-initializing Ts and qs at every time step
In addition to the suggestions mentioned in the comment above, one more suggestion would be to focus only a single timestep for which ITLEF is reaching ITLEFmax. Then, try to understand why the solution isn't converging fast enough. Is the solution oscillating between iterations? Or, just converging too slowly. If possible, it would be nice to plot equation (2.5.128) as a function of Tv and check if that equation is a smooth function of Tv.
Next plots from single-point simulation no_reinit_NR1 Same as 1x1_NR1 but stopped reinitializing Ts and qs at every timestep.
I restarted the simulation in the middle of year 2 to write out the progression of variables taking part in the iterative solution of Tv. For example...
This shows the values of dele and det over the course of the 40 iterations in a time step. Convergence would have occurred only if dele < 0.1 W/m2 and det < 0.01 K. "Day 9" in the x-label refers to the 9th day since the restart on July 1st.
The graph is characteristic of most cases that never meet both criteria in this second half of yr 2:
In a few cases...
Same simulation as previous post but plotting Tv and Eq. (2.5.128) as suggested by @bishtgautam
The figures show that Eq. (2.5.128) oscillates around 0 when Tv doesn't converge and Eq. (2.5.128) gets close to zero when Tv converges:
For the timestep that corresponds to "part of Day 9 Year 2" mentioned in this comment, I would guess either eq. (2.5.128) has a discontinuity (in value or in slope) or the approximation of the derivative of eq. (2.5.128) is not accurate enough (Text below eq. (2.5.132) describes certain partial derivative terms are neglected).
You may consider hacking the iteration loop at CanopyFluxesMod.F90#L804 by ignoring the t_veg
iterate values provided by Newton-Raphson method and let t_veg
go from 284.23K to 284.18K with a very fine dt_veg. Then, write values for eq. (2.5.128) and denominator of eq. (2.5.129). Finally, make plots of the two output values against t_veg.
Thank you @bishtgautam. Let me make sure I didn't misunderstand your suggestion:
(- 4._r8*bir(p)*t_veg(p)**3 +dc1*wtga +dc2*wtgaq*qsatldT(p))
and I wrote out Eq. 2.5.128 (this time and previously) as
(sabv(p) + air(p) + bir(p)*t_veg(p)**4 + cir(p)*lw_grnd - efsh - efe(p))
@slevisconsulting, Thanks for pursuing my suggestions and the plots. Here are few answers/comments
I ran the model up to the time step before the time step of interest to generate a restart file
That is exactly the way to go about.
dt_veg = -0.011 where I set this greater than the 0.01 limit to avoid exiting the iterative solver if we happened to meet the other convergence criterion; were you suggesting a finer dt_veg?
The "part of Day 9 Year 2" plots mentioned in this comment showed eq. (2.5.128) is between 4 and -1 [Wm^{-2}] for Tv between 284.23 and 284.18K. So, my suggestion was to pick a dt_veg such that between the iteration loop you would traverse the Tv range that allows eq. (2.5.128) to go from positive to negative values (i.e. dt_veg = (284.18-283.23)/40 ~ 0.02).
Do the plots in the above comment exactly correspond to the "part of Day 9 Year 2" plots mentioned in this comment? I ask because the values of eq. (2.5.128) are significantly different in the two comments.
I had to turn off error checking in BalanceCheckMod.F90 to prevent the model from stopping from an energy balance error
You can leave the BalanceCheck on and just let the model crash because you are analyzing a single timestep.
As a side, plots of Tv vs eq. (2.5.128) and Tv vs eq. (2.5.128denom) would be more intuitive to figure out any discontinuity.
First things first: I'm replacing the first of the two figures in the last post due to an error.
I also noticed that the values of eq. (2.5.128) and of Tv were much different, so I did two things:
1) I restarted without the code changes and confirmed bit-for-bit same answers as from the continuous run without the restart. This confirmed that I ran the test for the correct time step that reached ITLEF = 40.
2) I restarted the same time step once more and this time left taf with its restart value instead of initializing it to 284.23. Also I set dt_veg = -0.00075 similar to your suggestion but using instead (284.19-284.22)/40. The result is similar in that eq. (2.5.128) is about 60 W/m2 and Tv is about 282 K. This case crashes at ITLEF = 2 due to a balance error, so I did need to remove the call endrun
...
Here are the Tv versus Eq. 128 and Eq. 129_denom plots for the same case as the two previous plots:
If Gordon's statement about oscillating FSH & FLH from time step to time step really meant from iteration to iteration leading to ITLEF = 40, then I did find such behavior. Looking for the source of the oscillations that prevent det and dele from converging, has led me to the circular dependencies between temperature/humidity and latent/sensible heat fluxes. Investigating with some of the above suggestions may eventually uncover something helpful, but I'm not sure how deep to search in these circular dependencies, due to their complexity.
The single-point simulations at Niwot Ridge (NR1) have shown that the changes that I suggested in earlier posts offer no improvement in numbers of ITLEF = 40 so may not be worth pursuing. ITLEF = 40 appeared in about 1 of every 1000 time steps in all my tests.
The NR1 simulations with dtime = 5 minutes and 15 minutes also showed no improvement in numbers of ITLEF = 40 (again about 1/1000 time steps). This time I confirmed my earlier hypothesis that the atmospheric driver updates every 30 minutes, so the larger number of model time steps is constrained by repeating atm. conditions. I think this evaluates the substepping suggestion (correct me if I'm wrong).
In most ITLEF = 40 cases, the last 30 iterations see little if any change in det and/or dele toward convergence. This leads me to suggest the following "fix":
Currently in ITLEF = 40 cases, the solver oscillates about the solution all the way to ITLEF = 40, at which point it leaves us with the most recent Tv and qv. Instead of this I suggest that at ITLEF > 15 (?) we force det and dele to change just enough to pass the convergence test and adjust all affected terms to maintain energy balance. I will post an update in the next day or two.
I ran a new case with code modifications that enforce convergence. The code mods apply if convergence criteria are not met for ITLEF > 15. Convergence occurs by ITLEF = 17 due to βππ£ = max (οΈββπ(π+1)π£ β π(π)π£ββ , ββπ(π)π£ β π(πβ1)π£ββ)
The plot shows the new Tv relative to the default Tv solutions.
This plot shows the same but setting ITLEFmax = 17 in the namelist instead of enforcing convergence.
Not very different. No gains in mean(ITLEF) or number of cases of ITLEF > 15.
The convergence criteria focus on the change in leaf temperature and leaf latent heat flux, so this "fix" moves these two oscillating variables closer to the center without helping Eq. 128 get closer to zero:
Now back to the run without the fix... Same "Day-9" example as before. Here are the variables that make up Eq. 128 and Eq. 129 to show how they oscillate:
These terms are constant: sabv ~= 126.6 lw_grnd ~= 6.4e9 air ~= 285.3 bir ~= 0 cir ~= 0
@swensosc and I have worked on this issue for a little while:
@swensosc shared with me code that he had used in the past to loop t_veg values over the CanopyFluxes equations instead of iteratively solving for t_veg. I believe the concept is the same as what @bishtgautam recommended a few months back.
Unfortunately we didn't learn something definitive about the lack of convergence in the iterative solution:
a) @swensosc suspects the missing terms mentioned in the Technote Section 5.3.2: "The partial derivatives πππβ/πππ£ and ππππ€/πππ£, which cannot be determined analytically, are ignored for ππ»π£/πππ£ and πππΈπ£/πππ£." Sean does not think it's a good use of my time to pursue a solution to these because they are messy.
b) @swensosc also pointed me to the immediate next line in the Technote: "...if π changes sign more than four [sic] times during the temperature iteration, π = β0.01. This helps prevent 'flip-flopping'..." I used [sic] to indicate that I think "four" should say "three" because the code says: if (nmozsgn(p) >= 4) obu(p) = zldis(p)/(-0.01_r8)
So @swensosc suggested that we try exiting the solver instead of setting π = β0.01. I coded this as: if (nmozsgn(p) >= 4) exit ITERATION
Unfortunately, I found no benefit because there was little correlation between ITLEF == 40 (36 instances in 2 yrs at Niwot Ridge) and π changing sign more than three times (only 1 instance corresponded to an ITLEF == 40).
@swensosc points out that we often go through all 40 possible iterations in canopy fluxes. We should assess whether this can be reduced in order to improve performance.
The same issue could be causing BareGroundFluxes to take a long time.