optimizing photosynthesis

rgknox commented 6 years ago

@walkeranthonyp, Nathan Collier and I met up at the DOE Modeling PI meeting and had a fun discussion about improving the photosynthesis code. @walkeranthonyp has some more fundamental ways to improve how we go about this process regarding which schemes we use and how the equations themselves are cast, and Nate was interested in how we actual perform out solve for C_i (interstitial CO2), which is an iterative loop converging on a solution to non-linear algebraic equations.

See here where we perform the iteration: https://github.com/NGEET/fates/blob/master/biogeophys/FatesPlantRespPhotosynthMod.F90#L992

An important note here, is that the photosynthesis scheme calculates stomatal conductance (among other things), which in-turn effects latent-heat transfer from the surface, which effects land-atmospheric stability, which should tightly feed-back on these fluxes again. Thus, photosynthesis is called inside this broader iteration loop (CLM/ELM side of code) that seeks to converge on a stability parameter and land-fluxes with a low residual (change since last iteration). Note: These coupled iteration loops make the search for C_i (and the quadratic root finders) the most often called subroutines in the model. We cap this loop iteration at 5 steps currently.

I wanted to diagnose how efficient our code is currently performing the solve for C_I, and see if there is any low-hanging fruit that can be picked to help speed things up and make it more efficient.

The first thing I did was do some counting statistics to see the distributions of how many iterations are performed. The following plot shows results from a 1 year simulation at BCI, using a forest inventory initialization.

Top Left Panel: Histogram of counts needed to converge on the outer loop (stability parameter) Top Right Panel: Histogram of the final residuals in the C_i loop, using the existing formulation. The red-line is the residual error criteria for success Bottom Left Panel: Histogram of an ALTERNATIVE way to formulate the residual in the C_i loop, assuming that the residual is the change in C_i from previous time-point, normalized by the canopy CO2 partial pressure. Bottom Right Panel: Histogram of the number of inner loop counts for C_i (which is capped at 5. Spoiler alert, almost every single iteration attempt used up 5).

quad-bci-v1

So all iterations used up the max 5 attempts. Also, it seemed that the residual is strangely formed. Why is it the difference in C_i normalized by air-pressure and not CO2 partial pressure which should have the same units? I can't rectify the units on the current formulation, it is pascals of partial pressure divided by atmospheric pressure * 1 million? (maybe they thought that C_i was in units of PPM?) But it seems incredibly strict, the residuals at completion (due to step-count maxed) are all way above the error limit.

I was curious if there would be a significant shift in how many iterations were required, if I changed both the way the residual is formed, and gave it a lower success threshold of 1e-5. The residual that is evaluated here is the change in C_i divided by air CO2 partial pressure (which are the same units Pa, and similar magnitudes).

quad-bci-v2

My take home from this exercise is that by adjusting the form of the residual there is a very modest effect on how quickly it converges. Keep in mind here that the physics was not changed, nor was the algorithm changed that searches for a solution.

Nate Collier did some calculations, and thinks that with Newton-Raphson, a solution should be found within 2-3 steps (I think), so I think that would be a worthwhile undertaking, as this would speed up the code. I have to look at the code more thoroughly, to see what method we use now (bracketing?) but it is not Newton-Raphson.

walkeranthonyp commented 6 years ago

Nice one @rgknox , this is great! A few initial comments:

From talking to Nate, the solver in FATES is a fixed point iteration. The difference between the current guess and the old guess is evaluated against the convergence criteria. Once this difference is smaller than the convergence criteria the current guess is deemed the solution. A small note: apparently this is not a residual, residual in solver terminology refers to a function constructed for other numerical methods where the residual = f(x) - x or x - f(x) and x is what you are trying to solve.
The normalisation Ci / p * 1e6 is converting Ci in Pa to umol mol-1 (so solved values should be of the rough order 100-1000). The second normalisation Ci / Ca is the Ci:Ca ratio (so will give solved values of the rough order 0.5 - 0.9). So the convergence criteria of 1e-6 is extremely strict, especially when the difference is in units of umol mol-1.
Given that extremely strict criteria and the distribution of the difference I would say that the fact that the solution is not within the tolerance is probably not important. At least in this specific BCI test case. The size of this difference might change in other locations with different parameters or environmental variables.
Also I wonder what level of precision we need in the photosynthesis solve so that the canopy temperature solver will converge? Seems like that converges in 5 - 10ish loops. But again we have to be careful that this is at BCI, for this PFT, etc.
I'm working on a semi-analytical solver that doesn't need any iteration. It finds only an approximate solution but it's pretty damn close - see point above about what level of precision is needed. ELM and CLM4.5 and up use Brent's method, coded up by Jinyun I believe. We could use my method to come up with pretty tight brackets for the Brent method if people are not comfortable with an approximate solution (working on looking at how approximate right now). That would speed things up if Brent's method takes a substantial number of iterations to converge. Also Nate's Newton-Raphson method might work well. Or using previous solutions as the starting guess for the fixed-point iteration.
It would be interesting to know how many iterations Brent's method takes to converge in ELM or CLM and how many iterations the canopy temperature solve takes. Presumably about the same number as you show in your plots.

rgknox commented 6 years ago

if the convergence tolerance is 2e-6 micromols, that is 2 picomols / mol. I haven't heard anyone throw picomoles around in a while.

C_i, which is proportional to canopy co2 in partial pressure (pascals), hovers around the 10-50 depending on the century.

walkeranthonyp commented 6 years ago

@rgknox Haha, every picomole counts :) ... that is an extremely strict criterion. Probably a nanomole is strict enough but we can have that discussion.

More importantly is that this scheme might not converge everywhere (see Sun et al. 2012 JGR-Biogeoscience) and it would be good to understand what precision is need in the photosynthesis solve for the canopy temperature solve to converge. I know we discussed this over email but adding a couple of key points here for the record.

Sun, Y., Gu, L., Dickinson, R.E., 2012. A numerical issue in calculating the coupled carbon and water fluxes in a climate model. J. Geophys. Res. 117, D22103. https://doi.org/10.1029/2012JD018059

dlawrenncar commented 6 years ago

This is outside my area of expertise, but as a point of reference, we incorporated the recommendations from Sun et al., (2012) into CLM4.5.

On Tue, Nov 13, 2018 at 6:45 AM walkeranthonyp notifications@github.com wrote:

@rgknox https://github.com/rgknox Haha, every picomole counts :) ... that is an extremely strict criterion. Probably a nanomole is strict enough but we can have that discussion.

More importantly is that this scheme might not converge everywhere (see Sun et al. 2012 JGR-Biogeoscience) and it would be good to understand what precision is need in the photosynthesis solve for the canopy temperature solve to converge. I know we discussed this over email but adding a couple of key points here for the record.

Sun, Y., Gu, L., Dickinson, R.E., 2012. A numerical issue in calculating the coupled carbon and water fluxes in a climate model. J. Geophys. Res. 117, D22103. https://doi.org/10.1029/2012JD018059

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/443#issuecomment-438270668, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAcVLaT4iYU3H6F8IAp8SSmOkTE5Hw7ks5uus0GgaJpZM4YaS4Z .

walkeranthonyp commented 6 years ago

@dlawrenncar Right, so CLM4.5 and above don't have the problem of non-convergence. But not FATES as that uses mostly the CLM4 photosynthesis scheme.

The solver that we're working on should help to speed things up in FATES, CLM, ELM etc as it doesn't involve an iterative loop (for leaf scale photosynthesis at least). I'm still testing it to make sure it finds the correct solution under a wide range of conditions. So far, for Medlyn gs it seems very robust but with BB gs for some reason it fails to find an accurate solution in a few cases where vcmax is low and g0 is 0 (if I remember right) which is an unlikely situation. Will look at that again soon.

dlawrenncar commented 6 years ago

I wouldn't go so far as to say that CLM4.5 and CLM5 don't have problem of non-convergence. I just wanted to point out that maybe 'some' of the issues have been partially addressed in CLM4.5 so looking at that code may (or may not) be helpful. That said, we are also seeing that photosynthesis iterative calculations are one of the most expensive parts of our simulations. Here is the CTSM issue that we opened on this. Not much in there yet, but figured I might as well link to it.

https://github.com/ESCOMP/ctsm/issues/299

On Tue, Nov 13, 2018 at 1:01 PM walkeranthonyp notifications@github.com wrote:

@dlawrenncar https://github.com/dlawrenncar Right, so CLM4.5 and above don't have the problem of non-convergence. But not FATES as that uses mostly the CLM4 photosynthesis scheme.

The solver that we're working on should help to speed things up in FATES, CLM, ELM etc as it doesn't involve an iterative loop (for leaf scale photosynthesis at least). I'm still testing it to make sure it finds the correct solution under a wide range of conditions. So far, for Medlyn gs it seems very robust but with BB gs for some reason it fails to find an accurate solution in a few cases where vcmax is low and g0 is 0 (if I remember right) which is an unlikely situation. Will look at that again soon.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/443#issuecomment-438416128, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAcVCN4vNsm08HwdxwFMYsTDsMJ4T3Iks5uuyUqgaJpZM4YaS4Z .

walkeranthonyp commented 6 years ago

@dlawrenncar - right with you. Work is progressing this end and I will be presenting the beginnings of this research at AGU in Trevor Keenan and Nick Smith's canopy processes session. I think we will be able to get some speed up compared with the fixed-point iteration in FATES and hopefully the Brent solve in CLM too.

Do you have some numbers on what fraction of time CLM4.5 / 5 spends on the photosynthesis solve (and/or @rgknox for FATES)? If you wouldn't mind I'd like to quote them in my presentation at AGU.

dlawrenncar commented 6 years ago

@walkeranthonyp - The fraction of time spent on photosynthesis depends, of course, on the configuration of the model. For CLM5, we have timing for the canopy fluxes routine (which includes photosynthesis solution as well as the stability iterations that loop (max iteration 40!) around the photosynthesis calc.; canopy fluxes is finest available timing level in default output)

Prescribed vegetation mode (CLM5SP) - 26% Prognostic biogeochemistry (CLM5BGC-crop) - 8% Prognostic BGC with isotopes (CLM5BGC-crop-iso) - 4%

In other testing, as we reduce # of PFTs and other complex complexity (e.g., turn off PHS, reduce soil levels) to try to get a faster model for NWP applications, the cost of canopy fluxes starts to become almost dominant, and therefore priority candidate for improvement to get a faster model.

alistairrogers commented 6 years ago

Hi @dlawrenncar @walkeranthonyp,

Great to see those numbers. I think it would also be worth putting this in the context of the CESM too. If I remember rightly you once told me that the computational costs of the land model were about 6% of the total cost associated with full CESM. Is that right?

From: David Lawrence notifications@github.com Sent: Wednesday, November 28, 2018 4:19 PM To: NGEET/fates fates@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [NGEET/fates] optimizing photosynthesis (#443)

@walkeranthonyphttps://github.com/walkeranthonyp - The fraction of time spent on photosynthesis depends, of course, on the configuration of the model. For CLM5, we have timing for the canopy fluxes routine (which includes photosynthesis solution as well as the stability iterations that loop (max iteration 40!) around the photosynthesis calc.; canopy fluxes is finest available timing level in default output)

Prescribed vegetation mode (CLM5SP) - 26% Prognostic biogeochemistry (CLM5BGC-crop) - 8% Prognostic BGC with isotopes (CLM5BGC-crop-iso) - 4%

In other testing, as we reduce # of PFTs and other complex complexity (e.g., turn off PHS, reduce soil levels) to try to get a faster model for NWP applications, the cost of canopy fluxes starts to become almost dominant, and therefore priority candidate for improvement to get a faster model.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/NGEET/fates/issues/443#issuecomment-442608428, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ANTfrZF9vN4byydcXyz56Db9GT7mMqR6ks5uzv3VgaJpZM4YaS4Z.

walkeranthonyp commented 6 years ago

@dlawrenncar Thanks!

rgknox commented 6 years ago

Note that FATES is very likely to spend a larger time-share on photosynthesis than non-fates CLM/ELM, due particularly to the fact that we run these calculations over a large number of leaf layers. I don't have timing statistics on me, but will likely start reporting them as we test out alternative solutions to whats been covered in this thread.

dlawrenncar commented 6 years ago

CLM5 cost has grown faster than other components (adding crops, isotopes, more soil layers, plant hydraulics), so in CESM2 which is running CLM5BGCcropiso, CLM is up to about 18-20% of total CESM2 cost (I think, the calculation is actually non-trivial due to PE layouts and concurrent versus sequential) at nominal 1deg resolution for land and atmosphere.

On Wed, Nov 28, 2018 at 2:51 PM alistairrogers notifications@github.com wrote:

Hi @dlawrenncar @walkeranthonyp,

Great to see those numbers. I think it would also be worth putting this in the context of the CESM too. If I remember rightly you once told me that the computational costs of the land model were about 6% of the total cost associated with full CESM. Is that right?

From: David Lawrence notifications@github.com Sent: Wednesday, November 28, 2018 4:19 PM To: NGEET/fates fates@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [NGEET/fates] optimizing photosynthesis (#443)

@walkeranthonyphttps://github.com/walkeranthonyp - The fraction of time spent on photosynthesis depends, of course, on the configuration of the model. For CLM5, we have timing for the canopy fluxes routine (which includes photosynthesis solution as well as the stability iterations that loop (max iteration 40!) around the photosynthesis calc.; canopy fluxes is finest available timing level in default output)

Prescribed vegetation mode (CLM5SP) - 26% Prognostic biogeochemistry (CLM5BGC-crop) - 8% Prognostic BGC with isotopes (CLM5BGC-crop-iso) - 4%

In other testing, as we reduce # of PFTs and other complex complexity (e.g., turn off PHS, reduce soil levels) to try to get a faster model for NWP applications, the cost of canopy fluxes starts to become almost dominant, and therefore priority candidate for improvement to get a faster model.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub< https://github.com/NGEET/fates/issues/443#issuecomment-442608428>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ANTfrZF9vN4byydcXyz56Db9GT7mMqR6ks5uzv3VgaJpZM4YaS4Z>.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NGEET/fates/issues/443#issuecomment-442618342, or mute the thread https://github.com/notifications/unsubscribe-auth/AUAcVFWHtfjjIVheYEiwHmD0jBya6Kamks5uzwVWgaJpZM4YaS4Z .

NGEET / fates

optimizing photosynthesis #443