test trait effects individually

teixeirak commented 5 years ago

@mcgregorian1, I'm concerned that we have a lot traits that may be interacting in funny ways in the full model. To test the effects of traits, let's have the null model include height, (canopy position), year (categorical), and random effect (individual nested in species). Let's create a table with those results, and also pull out coefficients. If the coefficient switches between these parsed-down models and the full model, that indicates a problem.

mcgregorian1 commented 5 years ago

@teixeirak

I've finished the table for individually-tested traits. I made the table as a csv so it's easier to read (found here).

I've also compared those coefficients with the coefficients from the full models, shown here. The only one that's different is LMA. When you're free I can talk about this in person more.

teixeirak commented 5 years ago

Thanks! Note that WD is also different. LMA and WD are the two that were behaving contrary to theoretical expectations in the full model, and they are the two that are reversed. Good. More soon!

mcgregorian1 commented 5 years ago

This is the final result when we take out all dAIC <2. It seems now ring porosity has also been booted out of the top model, in addition to wood density.

teixeirak commented 5 years ago

can you please provide the coefficients?

mcgregorian1 commented 5 years ago

Whoops sorry about that. Here they are. They look to be going the direction we'd think.

mcgregorian1 commented 5 years ago

I was thinking since I have the first table with all the traits tested individually, and now the best model, I should make a separate table showing only the top best model across the individual drought years?

teixeirak commented 5 years ago

Okay, this top model looks good/ makes sense.

Yes, let's look at the best model for each drought year.

mcgregorian1 commented 5 years ago

@teixeirak

I've fully populated the table now, which is found here.

To clarify:

If dAIC <0, that means the null model (our top model) was better.
The coefficients are for each variable whether it's in the null model or in the tested model.
For the variable that has two coefficients (ring porosity), I said if either of those is negative then there is a "-" in the table. I did that because for all the times we've tested rp, I've never seen its coefficients be opposite each other, but I could be wrong. If you want, I could add a column similar to what I did to the first table.

I think what's interesting about this is the same trend we saw when first comparing the overall drought years with the individual ones, is that no individual drought comes close to matching the overall drought trend. In addition to the dAIC becoming positive, the coefficients change as well. Would this be evidence of what we see generally in climate science, where only by looking at the long-term do you see better representation of trends?

It appears that for all the variables, only TWI and TLP are in the top model for >2 scenarios.

teixeirak commented 5 years ago

Thanks! Some comments:

Given the very different variable sets for each drought, I wonder if the better way to define the top model would be to repeat the whole process of defining which variables to include as candidates in the top model separately for each year.
If you don't do the above, I wouldn't present the analyses of WD and rp by individual years, as those generally aren't great predictors, and are very inconsistent (too many parameters?).
if I remember right, 1999 was the weakest pointer year (???), so its probably not surprising that the results are so different.
its interesting that TWI and TLP are the only two variables with consistent responses across all droughts, even though they're not always in the top model.
It will be very interesting to look at these results in the context of the nature of each drought and think about mechanisms. For example, perhaps the prolonged 60s drought had a stronger impact on larger trees, whereas they weren't as bothered by the short drought in 1999?

mcgregorian1 commented 5 years ago

Three things on this:

I had messed up the calculation of dAIC earlier on, so actually rp and WD shouldn't have been in the best model anyways (that probably explains why they were contrary to the others).
Valentine made the suggestion and I agreed, that we'd include "year" as a tested variable to prove its importance. That is now included.
I've finished making a master table showing each variable tested individually against a null model for each of the drought scenarios plus all of them combined, which can be viewed here.
- Notice how the coefficients of position_all and rp are not constant in direction across all scenarios
- Subsetting this for each scenario for all variables with dAIC > 2 gives me the table below. There's some overlap between them, but no more than that.
- Based on this, do we take the best model then to be the one that has the most number of overlaps (in this case, only distance.ln.m and TWI make the cut), or do we present the best model for each separately, with the reasoning that each scenario is different (but then focus more on the overall scenario ["all"] for a longer-term trend)? I'm currently thinking the latter, what do you think?

teixeirak commented 5 years ago

* Based on this, do we take the best model then to be the one that has the most number of overlaps (in this case, only distance.ln.m and TWI make the cut), or do we present the best model for each separately, with the reasoning that each scenario is different (but then focus more on the overall scenario ["all"] for a longer-term trend)? I'm currently thinking the latter, what do you think?

I agree.

teixeirak commented 5 years ago

I don't think we should include elevation and distance in any analysis. They're inferior to TWI, both ecologically and usually statistically, and they just complicate the interpretation.

teixeirak commented 5 years ago

It's interesting that position seems to come out mostly consistent (although rarely significant), with dominant always lower than codominant.

mcgregorian1 commented 5 years ago

I decided to test something. The first four models are the top model for each year, using only the variables that had dAIC > 2 from the all-year scenario in the table. The bottom four models are the top model for each year using all the variables from the table.

Top variables

1966:

1977:

1999:

All years:

Note: position_all did not come out as dAIC>2 for the combined years, yet it shows up in the top model here

All variables

1966:

1977:

1999:

All years:

I'm confused on how to interpret/present this. I guess in a way this makes sense, since we were thinking of prescribing a set of variables for the individual drought years based on a trend seen only at the long-time scale (e.g. LMA, WD, and rp all were nixed from the combined-year model). We can still present a different model for each scenario based on these bottom models here, but it means we'd have to rethink how we'd present the hypothesis-testing table.

For example, the presence of rp and TWI in the top 1966 model don't match the outcome of the individual trait testing for that year, and I think the issue lies in how we interpret the null model. When there's only height, adding in rp clearly makes the model worse, yet when you have rp with all these other variables, it comes out on top. TWI in the individual-tested traits table has a dAIC of -0.563 and yet it's in the top model when everything is included.
For what it's worth, I have noted in my code that for good stats, there should be no more total parameters than 1/10th the number of observations in the dataset. For the scenarios of 1966, 1977, 1999, and combined, this comes to 5, 5, 6, and 16, respectively. Including all the variables like I've done here (8) clearly goes beyond this threshold for the individual years.

mcgregorian1 commented 5 years ago

The variables with differing coefficients compared to the master table are:

Top variables

1999: height.ln.m (negative, was positive)

All variables

1977: rp-semiring (negative, originally positive) Combined years: LMA (negative, originally positive), WD (negative, originally positive)

mcgregorian1 commented 5 years ago

@teixeirak when you get a chance, can I get your opinion on this please?

teixeirak commented 5 years ago

I don't trust these "all variables" models-- I'm concerned that they're over-parameterized. Please go with the top variables models.

mcgregorian1 commented 5 years ago

Ok.

As I understand it this is where we stand:

We tested each trait individually for each drought scenario, using height as the null model, intending to use the traits dAIC>2 to determine the best model for each one.
However, we noticed that in 1977 (e.g.), this would yield a model with only TWI.
Thus we decided to find the best model for each individual scenario using all variables that had dAIC>2 at some point, even though perhaps in the specific years they did not.

Is this correct? I think I kept getting caught up by how there are interactions we're not seeing, for example how position_all wasn't dAIC>2 for the combined-year scenario, yet when testing only these "top variables", it does come out in the top model. Same thing for ring porosity for the combined scenario, 1966, and 1977.

these are the variables that have dAIC>2

teixeirak commented 5 years ago

I thought the description of our method would be this:

Considering all droughts combined and for each individual drought, we tested our predictions by comparing a model with the relevant variable against a null model (Table X). When the dAIC>2, we considered the prediction supported. ....

To determine the best multivariate model for all droughts combined and for each individual drought, we .... To avoid over-parameterization of the model, we use included as candidate variables only those with dAIC>2 in the all droughts model.

teixeirak commented 5 years ago

Is that correct? I want to make sure I'm following corretly.

mcgregorian1 commented 5 years ago

Exactly, so that's the thing. I'm still having trouble justifying to myself prescribing what works best in the all-droughts model as being best for the individual years. Using that method, how do we justify the reality that when we include rp in the individual drought years, it always comes out as significant? Or do we ignore that because it's not part of this protocol we discussed?

teixeirak commented 5 years ago

Okay, how about this?

To determine the best multivariate model for all droughts combined and for each individual drought, we .... To avoid over-parameterization of the model, we use included as candidate variables only those with dAIC>2 in one or more of the of the individual models."

Is that what you did for the "top variables" models above? I notice that canopy position is in there, when its not dAIC>2 in the all droughts model.

teixeirak commented 5 years ago

I agree that its not ideal to limit the set of variables to those in the all-drought scenario, but there does need to be some limitation. Wood density and SLA in particular seem to be very inconsistent--acting more as free parameters than as meaningful variables.

mcgregorian1 commented 5 years ago

Okay, how about this?

To determine the best multivariate model for all droughts combined and for each individual drought, we .... To avoid over-parameterization of the model, we use included as candidate variables only those with dAIC>2 in one or more of the of the individual models."

This would allow us to include position_all and rp, definitely. I'm wondering what the justification would be on this if we were challenged on it? Would it simply be that since each individual drought is different we thought it best to take into account all possible top variables?

Is that what you did for the "top variables" models above? I notice that canopy position is in there, when its not dAIC>2 in the all droughts model.

Yes, my mistake there. I initially included both rp and position_all and noticed they appeared in the best models, but then realized we had said not to include them, hence my hesitation at moving forward.

I agree that its not ideal to limit the set of variables to those in the all-drought scenario, but there does need to be some limitation. Wood density and SLA in particular seem to be very inconsistent--acting more as free parameters than as meaningful variables.

Agreed. This is why I was hoping we could do something like what you've suggested (assuming we can ecologically justify it), because yes, I don't think WD and LMA should be represented in these last tests.

teixeirak commented 5 years ago

Okay, I think we have a plan then?

I do think we can justify this method by saying that since each individual drought is different we thought it best to take into account all possible top variables.

mcgregorian1 commented 5 years ago

Ok perfect! I'm on board with this plan.

Thus, for my next steps:

I can now get the best model for each drought scenario and put those in a table, so then the models would be done.
The graphs are close to being done, what's mainly left will be formatting them all together.
Otherwise I believe I have the writing left.

Am I missing anything else here that you can think of?

teixeirak commented 5 years ago

Nothing offhand!

teixeirak commented 5 years ago

Closing (obsolete).

SCBI-ForestGEO / McGregor_climate-sensitivity-variation