Choosing a model - Githubissues

mcgregorian1 commented 5 years ago

Hi @teixeirak

I've spent the morning reading up on AIC and R^2 values and I have some updates before I move forward with #7

First of all, I was running the models with NA values. Removing them, however, doesn't change the model ranking.
"year" should be fixed effect, but I already mentioned this and it doesn't really affect the model ranking.
I was reading a number of posts saying how AIC is great for comparing similar models, but ultimately the "best" model based on AICc values isn't necessarily the "best" model overall. They suggested using R^2 values to give a comparison. Doing this for the models gives me a range of 0.084 - 0.095. In other words, in reality, none of the models explain more than 10% of any variation in the resistance values.

Here's what an R^2 value would look like that we were hoping to see:

I decided to check plots of this to be sure and, it turns out there are indeed almost no trend lines. dbh_old gives the closest to an effect, but even then the relationship is slight.

Here is another visualization showing the predicted values of dbh_old (the line) based on the regression equation plotted with the actual observations. The trend is there but very very slight.

Next steps

I'm not sure where to go from here. Valentine quickly was trying to maybe fit this to a generalized linear mixed model as opposed to a linear mixed model, but at this point I'm nervous about grasping for straws given already how I've failed to flag this lack of fit from earlier.

In short, it seems I've been putting too much faith in AICc without fully understanding the backstory behind what it means, even though I thought I knew.

teixeirak commented 5 years ago

What happens if you drop all trees with resistance>2 in any drought, under the logic that those had something unusual (e.g., release from competition)? are those the only thing driving the dbh effect?

mcgregorian1 commented 5 years ago

Dropping those observations does work out more, because the top range of the R^2 goes to 0.1054 instead of 0.09.

In terms of regression lines for the dbh effect, though, I'm not sure it helps too much

teixeirak commented 5 years ago

I don't know that this will help the model any, but DBH should be ln-transformed.

mcgregorian1 commented 5 years ago

Just barely. The r^2 goes from 10-11%.

In terms of the plot, this is it with ln of dbh

teixeirak commented 5 years ago

It looks like a few of the >2 resist values are still in there (although that probably doesn't change the big picture).

teixeirak commented 5 years ago

@mcgregorian1, could you please re-run this, dropping the >2 resist values (it seems a few snuck back in)?

Please post the updated model comparison as well.

mcgregorian1 commented 5 years ago

Double-checking that all resist.values >2 are dropped, I get the following, where the best model is easily the one that contains all effects. The R2 increases to 12%.

Here is the graph comparing the dbh effect as well.

teixeirak commented 5 years ago

Great! Could you please send the ANOVA output for that full model (coefficients for each variable)?

mcgregorian1 commented 5 years ago

Here's the anova. Let me know if you want more than that

mcgregorian1 commented 5 years ago

For the record, I saw Neil's new paper stating that moisture availability was the top limiting factor for growth, so I tried adding in "pre" and "PETminus PRE" from Valentine's analysis (using the CRU data). The model runs confirmed why we didn't consider these in the first place, because adding them in doesn't help the model at all.

teixeirak commented 5 years ago

I wouldn’t expect those to come out in this analysis, given that we’re just looking at 3 drought periods, but I would expect them to come out as a predictor of growth across the full data set (as Ryan’s paper shows).

teixeirak commented 5 years ago

I somehow missed your earlier ANOVA post. Could you please post the coefficients as well?

mcgregorian1 commented 5 years ago

Hi @teixeirak

This is the anova from Monday

Here's the anova. Let me know if you want more than that

Past that I'm not sure if you wanted more? If I specifically ask for the coefficients based on species, I get the following:

I looked at the Lloret et al paper again and I noticed they made graphs comparing the residuals of the PDSI values with the residuals of the BAI increments. Since our R2 values were so low I decided to double check (using the mean of each 12-month period), and I got the following. It confirms our drought years are what we thought (though 1991 does show up on both graphs more than I would have thought).

BAI 1950-2009

PDSI 1950-2009 (from noaa)

teixeirak commented 5 years ago

We have a new related graph for Ryan's paper here. A lot of dendro people use BAI, but for theoretical reasons, I wouldn't expect it to work appropriately at our site.

mcgregorian1 commented 5 years ago

I see. But isn't basal area increment a standardized transformation of the data? So if we already have the ring widths, then transforming it to bai wouldn't change much in the overall distribution. Neil (or Alan) mentioned in an email that they didn't think using one or the other (bai or ring widths) would change analyses much.

mcgregorian1 commented 5 years ago

I've re-run the model with only using the strongest drought year (1966) and I am getting R2 of 0.26 for the best model, which is below:

teixeirak commented 5 years ago

Regarding BAI, it is a commonly used transformation, which to my understanding is generally seen as size-independent, based on the assumption that there is no intrinsic relationship between DBH and total new wood production (i.e., equal biomass production distributed around an ever-growing stem forms progressively smaller rings). I believe that this concept of a decrease in ring width with tree size stems from observations of old open-grown conifers and such, where you do (often) see a decrease with age. In contrast, at SCBI radial increment tends to increase with DBH, so we expect BAI to increase with tree size. This is gradual enough that BAI should give you a similar trend in terms of droughts, but I'd consider the ring width chronologies to be more reliable.

teixeirak commented 5 years ago

Regarding the ANOVA, could you please specify the sign conventions for position and ring porosity?

teixeirak commented 5 years ago

Regarding the analysis for 1966, I like that you did that, and its good to see that higher R2. Could you please post the coefficients for that one as well?

teixeirak commented 5 years ago

It would also be interesting to see how this looks for the most recent year. I note that (current) canopy position improved the model with all years but not that for 1966.

mcgregorian1 commented 5 years ago

Sorry it's taken so long for me to get back.

Here are the coefficients (mean) when only including 1966 as the year (this model has the r-squared of 0.26) - this is from the best model above in 1966.

Running the most recent year (1999) gives close models but with slightly worse r-squareds (max 0.23). However, your hypothesis about canopy position for the most recent major drought year appears to hold up. The coefficients here are also the mean.

Regarding the ANOVA, could you please specify the sign conventions for position and ring porosity?

I'm not sure what you're meaning about the sign conventions in the anova from last week, since they're all positive?

mcgregorian1 commented 5 years ago

I'm wondering if it would be worth it to change how we're calculating the drought year. Foster et al specifically did a October-September timeframe to calculate TMAX and other values, and it reinforces to me that while we identify 1966, 1977, and 1999 as drought years, the PDSI values don't concisely match those years (e.g. the 1966 drought went from 1965 - a bit in 1967). As in, taking a mean of PDSI values over all years reveals slight derivations in these three, but it's not the same as taking the values for the drought itself.

I say this especially given 1977 I believe has positive PDSI values for the first part of the year and then drops.

Foster et al also bring in potential temperature variations in the future using RCPs 4.5 and 8.5 from the IPCC report. I think that could be interesting but I also wonder if that's going too far for this particular analysis.

Also of note, it seems many tree ring analyses use temperature as a fixed effect.

mcgregorian1 commented 5 years ago

One more thing because I was curious in case I add done something wrong in my code. I went back and determined exactly which species were affected by the different drought years, and overall the representation isn't the best.

We have a total of 26 core categories (14 species, per canopy position, exclude frni and caco canopy because only 1 core). Over the 3 drought years, we only have 12 of these represented (so <50% of our cores showing significant growth anomalies from the drought years).

When you look at the full picture, you notice there isn't a trend in terms of which canopy position is more affected, plus the species distribution is interesting. litu, pist, and juni show up in 1966 and 1999, but of those only litu shows up in 1977.

I'm not sure what this means but also notice how litu as a species is present for about 30% of the total observations.

teixeirak commented 5 years ago

As mentioned in person, we don't need to worry about which species / canopy positions responded to the drought. That's part of our question. Rather, we just want to know when there were drought events that affected a large portion of trees.

teixeirak commented 5 years ago

I think we're done with this issue (it's now out of date/ replaced by others).

SCBI-ForestGEO / McGregor_climate-sensitivity-variation

Choosing a model #9

Next steps