Closed kpalmqui closed 4 years ago
@kpalmqui To be clear, the algorithm should look like
1) Scale derived values to sum to the sum of the default values.
2) Find the largest derived value and set it to 1.
3) If any other derived values are greater than 1 they should be set to 1 as well.
is that correct?
@chaukap sorry for the poor documentation on my part.
Actually, I think the peak value needs to be fixed to 1.0 before the values are rescaled so they sum to the sum of the default values. Otherwise, the values will not sum to the default values anymore.
So: 1) Find the largest derived value and set it to 1 (regardless of whether it is below or above 1.0). 2) If any other derived values are greater than 1 they should be set to 1 as well (does this ever occur?). 3) Scale derived values to sum to the sum of the default values.
@dschlaep will 2) here pose any problems for SOILWAT2 biomass? I.e. if there are multiple months that have derived biomass fraction values of 1.0?
I believe that it is relevant to treat monthly live biomass fraction and monthly litter fraction separately from monthly biomass fraction because they operate at different levels:
Therefore,
values of monthly live biomass fraction and monthly litter fraction are always 0-1 and probably strictly smaller than 1 for most/all rgroups; these are passed to SOILWAT2
values of monthly biomass fraction are always 0-1 as well, but at least one month should be exactly 1 -- otherwise, SOILWAT2 does not see the full/correct biomass amount of STEPWAT2; monthly biomass fraction are not passed to SOILWAT2, instead the resulting monthly biomass values are passed to SOILWAT2
Thus, I believe point 1 (scale so that max = 1) should only apply to monthly biomass fractions and not to the other two.
I am not sure that I understand point 2 (I don't see how values can be larger than the maximum).
Also, I am not sure that point 3 (new sum = old sum) is relevant for monthly biomass fraction; however, it seems that this is what we want for monthly live biomass fraction and monthly litter fraction (in addition to make sure that they are within 0-1). Basically, this would say that the overall relationships between live and total biomass and between litter and total biomass stay constant.
One example that illustrates why combining points 1 and 3 for monthly biomass fraction does not work as intended is having a longer season where biomass > 0:
Applying point 1 to x' results in x'1 = c(1, 0.667, 0.333, 0 ... 0), i.e., divide by max(x') = 1.5 Applying point 3 to x'1 results in x'3 = c(0.75, 0.5, 0.25, 0 ... 0), i.e., divide by sum(x'1) = 2 and multiply by sum(x) = 1.5
Thus, we cannot simultaneously keep maximum at 1 and the sum identical to x
@dschlaep
Thus, I believe point 1 (scale so that max = 1) should only apply to monthly biomass fractions and not to the other two.
This issue is specific to biomass fraction, it will not be implemented for %live or litter fractions.
I am not sure that I understand point 2 (I don't see how values can be larger than the maximum).
There were previously a few cases where dervied values under warmer future scenarios were > 1 coming out of adj_phenology_by_temp. This is why we capped biomass fraction, litter fraction, and %live fraction values to 1.0. issue #194 (see Figures below for examples)
Also, I am not sure that point 3 (new sum = old sum) is relevant for monthly biomass fraction;
We did this because otherwise, there are large increases in biomass fraction under future conditions in response to warming, which I'm not sure are realistic. See here:
@kpalmqui My understanding is that we are trying to maintain three relationships with the biomass fractions: 1) The ratios between monthly values should be unaffected by any normalization of the values. Otherwise, we would obfuscate @dschlaep's scaling function. 2) The sum of all values should be equal to the sum of the original (default) values, as implemented in issue #197. 3) There should be at least one value equal to 1 and no values greater than 1, as stated in this issue.
Maintaining these three relationships is impossible in any non-trivial case. While trying to enforce 2
we will break 3
and if we then enforce 3
we will break 1
and 2
.
For example, in this implementation:
1) Find the largest derived value and set it to 1 (regardless of whether it is below or above 1.0). 2) If any other derived values are greater than 1 they should be set to 1 as well. 3) Scale derived values to sum to the sum of the default values.
Step 3 will undo the work done in steps 1 and 2 such that there is no guarantee than any value will be equal to 1.
And in my proposed implementation:
1) Scale derived values to sum to the sum of the default values. 2) Find the largest derived value and set it to 1. 3) If any other derived values are greater than 1 they should be set to 1 as well.
Step 2 will undo the work done in step 1 such that the values no longer sum to the sum of the default values.
One of the original three relationships I mentioned must be emphasized over the other 2. I'm assuming relationship 3 is the most important and we should emphasize it, at the expense of relationships 2 and 1. I can still do my best to enforce an approximation of relationships 1 and 2, but they will not be exact.
@chaukap @dschlaep following up on this issue with an update.
@dschlaep and I have agreed that biomass fraction values should be scaled such that:
Within STEPWAT2, we plan on updating biomass fraction values for most functional types. In some cases, the default values will have multiple months where biomass fraction = 1.0. We will need to evaluate whether this is preserved at all in the derived values and if additional action is needed.
As @dschlaep suggested in a previous comment, it is also important that we ensure that no biomass fractions are going above 1.0. I do not think this will occur with the rescaling code, but we need to ensure this never occurs.
@chaukap we can discuss further during our call tomorrow.
@kpalmqui I just pushed an implementation for this issue, but I ran into a problem during testing: adj_phenology_by_temp_v2
and adj_phenology_by_temp
both ran into infinite loops. I'm not sure why it happened, but if you get a chance could you try running my latest commit and see if either adj_phenology_by_temp
or adj_phenology_by_temp_v2
are able to complete? Thanks!
@chaukap nope, getting stuck on my end too - looks like during markov weather generation.
I am guessing this has to do with the new changes to rSOILWAT2 and not your code. We could confirm that by checking out old commits of rSOILWAT2 and running with your latest commit on rSFSTEP2. I won't be able to get to any additional testing though until tomorrow.
@kpalmqui I'll look into this tomorrow as well.
I did walk through the code line-by-line and the line that never terminates is the call to adj_phenology_by_temp
. I'm guessing its a minor loop problem like not incrementing a counter.
That's weird, indeed!
I haven't made changes on adj_phenology_by_temp
since Dec 19, 2019
(see blame https://github.com/DrylandEcology/rSOILWAT2/blame/f82533853e872dcc23a8679b72d4d9950a038445/R/sw_Vegetation.R)
Have your run the examples in the documentation of adj_phenology_by_temp
? Do they work for you as they do for me? If you email me the objects that you pass as arguments to adj_phenology_by_temp
that cause it to get stuck, then I will take a closer look at it.
I spent some time experimenting today and I was wrong about the infinite loop. By adding some print statements I was able to see that the program is running as expected, just much slower than it used to.
My loop structure is
for each climate scenario
for each matrix (i.e. litter, pctlive, biomass, and pctlive)
for each row in the matrix
adjusted <- adj_phenology_by_temp_v2(...)
Previously this took about 5 seconds to complete for 7 climate scenarios. Now, it takes about 15 minutes.
I timed each iteration of the for each row
loop and each iteration took about 3 seconds. Considering there are 10 rows in each of the pctlive, biomass and phenology files this adds up quickly and for 10 scenarios the estimated time would be 15 minutes.
I know that there are functions like sapply
in R which can speed up function calls, so I think my next step is to try and make my function more similar to your example, @dschlaep. If I still can't get the time down I'll let you know.
adj_phenology_by_temp_v2
is much slower than adj_phenology_by_temp
thanks for the clarification Chandler! This is actually not the end of the world, as this calculation is only done one time for each site during the course of the simulations, so adding 15 minutes to the simulation run time is not that big of a problem.
Yes, I was bummed that adj_phenology_by_temp_v2
was much slower, but then I had the same thought as Kyle that we only do this once. Still, faster would be nicer.
I think that adj_phenology_by_temp_v2
is slow because it has to find many points of intersection between two functions, e.g., radial line and a temperature spline. In my testing during development, the function optim
often got stuck in a local minimum. This costs much time as I then attempt to use different algorithms: first "BFGS", then "Nelder-Mead", then "SANN". If we come across an easy-implementable global optimization routine, then we should replace the code with that!
On a side note, loops in R are often slow not because loops are inherently slow, e.g., see the example code illustrating this statement: "This loop is surprisingly slow because each iteration of the loop copies the data frame [three times!]" in the chapter "2.5.1 Objects with a single binding" in the excellent book Advanced R
2nd edition by Hadley Wickham: https://adv-r.hadley.nz/names-values.html#modify-in-place
@dschlaep Thanks for the reading material it was very helpful! I'll continue experimenting with my function to see if I can speed it up, but in the mean time 27e73ae resolves this issue. @kpalmqui If you have a chance to test it let me know how it goes.
@chaukap I am closing this issue because the original purpose of this issue has been handled. I think we can always add a new issue for adj_phenology_by_temp_v2 is we deem that necessary in the future (this is a seperate issue). thanks!
For the default values in InputData_Biomass, the peak biomass fraction is always 1 for a single month. This is important in ensuring enough biomass accumulates in SOILWAT2 every year. We would like to retain the peak biomass fraction at 1 under all derived fractions that result from the function adj_phen_bytemp and associated code in rSFSTEP2.
Thus, this requires an additional constraint on biomass fraction values within rSFSTEP2. In addition to re-scaling derived values to sum to the sum of the default values (now resolved by issue #197), the peak monthly biomass fraction (in whatever month it occurs in) should be fixed to 1.0.
@chaukap let me know if you have questions about this new issue.