Hypothesis 1: Height is a strong predictor of drought stress

mcgregorian1 commented 5 years ago

H1- Large trees suffer more during drought because of the greater biophysical challenge of lifting water to greater height (see discussion/refs in Bennett), and therefore height itself is a strong predictor. P1-Drought response increases with height at time of drought (derived from dbh). Height will be a significant predictor both alone or in combination with canopy position and elevation. Preliminary results : This is true; inclusion of dbh --and by extension height--strongly improve statistical models (#6). Caveat : Height may also be just a strong co-variate of canopy position or root water access. Support of this prediction doesn't exclusively support H1, but if height itself is less important, the other drivers should come out as significant and potentially stronger predictors. Next step:

[x] confirm result holds when using height predicted based on dbh (with bark correction) instead of dbh (see issue #8).

Preliminary conclusion: Taller trees suffer more during drought, and testing of additional hypotheses will identify whether height itself is the most important driver, or whether a correlate of height is more important.

Originally posted by @teixeirak's writing in issue #7

mcgregorian1 commented 5 years ago

Hi @teixeirak

I've calculated the regression line for height, but I was wondering what to do for missing values. For example, I have several trees in the height csv that were dead in 2018 (dbh of 0) but were still given heights. I can give them their dbh in 2013 but that's 5 years' difference in a measurement.

In addition, I have a similar circumstance but with a tree that was considered dead in 2013 but also had a height. What do you suggest?

teixeirak commented 5 years ago

Please use the DBH taken closest to the date of height measurement (2013 for most?). I don't think there will be any more than a few years off, which is an acceptable difference.

teixeirak commented 5 years ago

What is the functional form you are using? Please fit with one of the functions described in this paper-- whichever works best.

mcgregorian1 commented 5 years ago

With the 2018 data added in, qual now has 6 data points, and there's a better relationship. I think it's fine to keep in, but it does have a much lower intercept than everything else. What do you think?

teixeirak commented 5 years ago

I still wouldn't use the QUAL relationship, particularly for trees beyond the observed range of sizes (that applies to any tree). That's the sort of thing that could really skew your analysis.

mcgregorian1 commented 5 years ago

@teixeirak

I read through the paper and looked up the equations they used, and I'm not sure those are going to work with the data I have.

Becky suggested a polynomial model to fit my data, and it produced the following (these are the highest r-squareds I've gotten). What do you think? Can we use this?

ValentineHerr commented 5 years ago

polynomial allows the curve to go back down, which would mean the older/bigger a tree gets, past a certain size/age, it will becomes smaller in height... I don't think you wan that. log don't have that problem.

ValentineHerr commented 5 years ago

Also, R-squared is necessarily better because poly has 2 parameters, vs 1 for log. Use AIC if you want to compare which is best in your particular case. Also, note that what is best in your particular case (given your data) is not necessarily what is best for predictions on new data.

teixeirak commented 5 years ago

I agree.

mcgregorian1 commented 5 years ago

I'm confused by these outputs. Even with AIC the poly is still best, despite it ecologically not making any sense.

teixeirak commented 5 years ago

Please drop the polynomial and linear fits from consideration. Even though the polynomial fit may provide a closer fit to the particular set of data we have for some species, there's only one instance where the largest tree is shorter than smaller trees (FAGR), and that one is probably just a tree that's shorter than average for its size. We know that a decline in growth doesn't match the biology, and this functional form will be very problematic when extrapolating beyond the range of data.

Please do this analysis for a power fit. I still expect that that should work better than any of these options. You could also try a polynomial where both height terms are logged, as in Chave et al. 2014, eq. 6a.

mcgregorian1 commented 5 years ago

Using a power fit (ln(height) ~ ln(dbh), I get the following:

Using Chave's equation might work, but they specifically define an environmental variable E, which in this case coarse woody debris.

teixeirak commented 5 years ago

The power function looks good.

Regarding Chave equation, I meant to use the basic functional form (minus the E term). That is, use a polynomial function with ln (D) - (ln(D))^2. If you want, you can try that for comparison, but I am satisfied with the power fit.

(By the way, in this case CWD means climatic water deficit.)

mcgregorian1 commented 5 years ago

It looks like with the polynomial the fit is a little worse than the power, at least according to the equations. I'll go ahead and stick with the power function for the full analysis.

For CWD, gotcha. That makes much more sense.

mcgregorian1 commented 5 years ago

Adding in height_ln to the model (with all covariates across all four drought years) definitely makes the model better. R-squared is 0.18 for the top two models.

teixeirak commented 5 years ago

Okay, so it looks like height does a bit better than DBH? Is this also true when we compare a model with just height (+year, random effects) to one with just DBH?

mcgregorian1 commented 5 years ago

Yes for height being better than dbh. When we compared just the two of them (plus year and random effect), we see the same overall effect hold up.

teixeirak commented 5 years ago

Okay, so height is an important predictor, and better than DBH. H1 holds.

mcgregorian1 commented 5 years ago

Hi @teixeirak

I know you've mentioned this, and apologies for that. Now that I'm updating the table I want to be sure I'm doing this correctly regarding your table.

For 1.0 (ln[dbh]), the model run includes dbh_ln and year as fixed effects, and then the sp/tree for random. For all years, this is the correct dAIC correct? Or is it the one with only dbh and the random?

Similarly, for each year (1964-1966 in this example), my model only has dbh_ln and the random sp effect. Thus the dAIC I'm assuming is this one, correct?

teixeirak commented 5 years ago

Correct on both.
As you go, please be sure to confirm that the direction of each effect matches that listed in the table. As noted in a footnote to the table, when response is opposite prediction, dAIC is listed as NA (there no instances of this where dAIC>2 --confirm).

mcgregorian1 commented 5 years ago

Ok.

This first one gives an example of the predicted response. The trend for all years and 1964-66 is that dbh_ln added to the model gives the best prediction, but for 1977 and 1999, dbh_ln makes the model worse. I've updated the table so you can see what I mean.

in this case and others like it, which model run am I taking to be the overall predicted interaction? Am I wrong in thinking I should be using the model that includes all years?

teixeirak commented 5 years ago

We'll obviously be presenting all. In terms of presentation, I'd emphasize the model with all years, but discuss anything noteworthy in individual years. In this case, we'd say that overall DBH had the expected effect, but it failed to significantly improve the model in 1977 and 1999.

teixeirak commented 5 years ago

The values in the table look like what we want.

mcgregorian1 commented 5 years ago

Perfect. I'll update the rest

teixeirak commented 5 years ago

Can you automatically generate the table so that you don't have to manually fix it if/when we (almost inevitably!) have to modify the analysis?

mcgregorian1 commented 5 years ago

I can try. It's contingent on me being able to isolate each specific AICc value from the different model runs

teixeirak commented 5 years ago

Okay. Don't lose tons of time on it, but its probably worthwhile if you can get it fairly quickly. There always seem to be unanticipated needs to re-run analyses. (And we know we'll need to re-run most models when we implement the bark correction, unless you do that first.)

mcgregorian1 commented 5 years ago

I've created the table from a script - it is here. The important thing to note here is that if dAIC is negative, then that means the actual model is better than the null model. If dAIC is positive, then the null model is better.

teixeirak commented 5 years ago

Wonderful! I'll review it carefully soon.

teixeirak commented 5 years ago

A couple things:

[ ] Please be sure to confirm that the direction of each effect matches that listed in the table. As noted in a footnote to the table, when response is opposite prediction, dAIC is listed as NA (please check if there are any where dAIC>2). For example, 1.2c2 and 1.3b1 predict different directions of response. Assuming results remain similar to what I've seen before, dAIC for 1.2c2 should be NA.
[x] Isn't the convention that positive dAIC means the target model is better than the null model?

mcgregorian1 commented 5 years ago

For your second question, that's what I initially thought. But if you have a table where the target model is the best one, then it's dAICc in the table is 0, and the null model would be some number >0 because it's worse. Therefore, for our equation of dAIC(results) = dAIC(target) - dAIC(null), it should be negative if the target is indeed the best one.

Ok. I can make the table with those.

mcgregorian1 commented 5 years ago

Hi @teixeirak I've updated the table with the concept you were talking about. Note my note about ring/semi-ring - it's due to how the model interprets it (e.g. ring could be + but semi-ring could be -).

Another note - all these models are still being run with REML = FALSE. From what I've read, we put in REML = TRUE when we're testing only one model together

mcgregorian1 commented 5 years ago

I've updated the outline with a table of where the dAIC values are >2 but were labeled NA due to interaction differences with the prediction. Are these areas then that we want to make a special note? Or does this require some model/code tweaking?

teixeirak commented 5 years ago

I think we're done with this issue.

mcgregorian1 commented 5 years ago

Hi @teixeirak I don't remember if you asked me to do this or if I had the idea, but I've made graphs comparing 2018 data (dbh and crown position) to the pointer-year data. Height here is extrapolated based on dbh using the regression equations that were determined prior.

They are here (current_dbh_height_all_years.pdf).

teixeirak commented 5 years ago

Good. I think that the 2018 graph for DBH and/or height would be a good addition to the manuscript.

SCBI-ForestGEO / McGregor_climate-sensitivity-variation

Hypothesis 1: Height is a strong predictor of drought stress #12