multivariate analysis - Githubissues

beckybanbury commented 5 years ago

@teixeirak Now that I've run the AIC analysis on the models, if the best model is one with only a single climate term, how do you think we should present this? Do we want to just present the results for the fluxes where the best model is a multivariate model, or should we present the results for the multivariate model with the best AIC score for each flux (even if it isn't the best overall model)?

beckybanbury commented 5 years ago

response	mod	significant	p.value	sample.size	Rsq.R2m	Rsq.R2c	fixed1.coef	fixed2.coef	int.coef
GPP	mean ~ map*TempSeasonality+masl+(1\|geographic.area/plot.name)	TRUE	8.85E-17	226	0.7338	0.9635	0.0036	-0.0833	-1.00E-04
NPP	mean ~ mat*PreSeasonality+masl+(1\|geographic.area/plot.name)	TRUE	7.06E-10	97	0.5667	0.8179	0.3756	0.0364	-0.0031
BNPP_root	mean ~ SolarRadiation*PreSeasonality+masl+(1\|geographic.area/plot.name)	TRUE	4.52E-05	86	0.3786	0.8282	0	0.0375	0
BNPP_root_fine	mean ~ map*VapourPressureDeficit+masl+(1\|geographic.area/plot.name)	TRUE	0.007995	84	0.1912	0.735	0.0015	4.6968	-0.0022
ANPP	mean ~ mat+PotentialEvapotranspiration+masl+(1\|geographic.area/plot.name)	TRUE	2.51E-17	198	0.5048	0.843	0.239	-0.002	NA
ANPP_foliage	mean ~ mat+PotentialEvapotranspiration+masl+(1\|geographic.area/plot.name)	TRUE	1.61E-11	68	0.6567	0.8141	0.1328	-0.0011	NA
ANPP_woody_stem	mean ~ mat+masl+(1\|geographic.area/plot.name)	TRUE	7.65E-10	229	0.2418	0.8766	0.0713	-0.1326	NA
R_auto	mean ~ mat+masl+(1\|geographic.area/plot.name)	TRUE	1.07E-05	14	0.8321	0.9493	0.502	-10.5145	NA
R_auto_root	mean ~ map*TempSeasonality+masl+(1\|geographic.area/plot.name)	TRUE	0.002024	46	0.4256	0.8511	7.00E-04	-0.0063	0

This table shows the best model for each flux; you can see that there are two for which the best model is a single climate term, and two for which the best model is additive, rather than interactive.

teixeirak commented 5 years ago

Let's keep the same table you showed me yesterday. The change will be that some variables will have a coefficient for only one climate variable, and often no interaction term.

beckybanbury commented 5 years ago

	Sample size	R-squared	Mean annual temperature	Temperature seasonality	Mean annual precipitation	Precipitation seasonality	Potential evapotranspiration	Vapour pressure deficit	Solar radiation	Interactive effect
GPP	226	0.73	-	-0.083	0.004	-	-	-	-	-0.0001
NPP	97	0.57	0.376	-	-	0.036	-	-	-	-0.003
ANPP	198	0.5	0.239	-	-	-	-0.002	-	-	NA
ANPP_foliage	68	0.66	0.133	-	-	-	-0.001	-	-	NA
ANPP_woody_stem	229	0.24	0.07	-	-	-	-	-	-	NA
BNPP_root	86	0.38	-	-	-	0.038	-	-	<0.0001	<0.0001
BNPP_root_fine	84	0.19	-	-	0.002	-	-	4.697	-	-0.002
R_auto	14	0.83	0.5	-	-	-	-	-	-	NA
R_auto_root	46	0.43	-	-0.006	0.0007	-	-	-	-	<0.0001

Sounds good - the table will look similar to this.

teixeirak commented 5 years ago

Looks good. Could you please transform the units on solar radiation such that the terms aren't so small (or use scientific notation)?

teixeirak commented 5 years ago

What exactly is VPD? Is this the average of monthly means?

beckybanbury commented 5 years ago

What exactly is VPD? Is this the average of monthly means?

Yes, it is. I can't find information on exactly how it was derived, but it is derived from WorldClim data, and the values used for analysis are an average of monthly means between the years 1958 - 2017.

teixeirak commented 5 years ago

I actually wonder if something like maximum monthly VPD would be more informative. A lot of forests would experience only a short dry season, with low VPD during the wet season and/or winter.

beckybanbury commented 5 years ago

Would you calculate this by finding the highest VPD value recorded for each year, and then taking an average of these values?

teixeirak commented 5 years ago

The other option would be to use the month that has on average the highest VPD, but I think what you suggest makes sense (probably more sense).

beckybanbury commented 5 years ago

Effect_of_MaxVPD1_MATURE_only_poly_all Effect_of_MaxVPD2_MATURE_only_poly_all

I derived max monthly VPD, and it's not a good predictor in the main model (graphs above). It comes out in the multivariate models for BNPP_root and R_auto_root, even though it isn't a significant predictor for either of these individually. Do you think it's worth including it still? In the multivariate model I have so far only included predictors that were significant to minimise the range of variables included.

teixeirak commented 5 years ago

I think it could go either way. Are the models that include it much better than those that you'd get if you eliminate it entirely?

beckybanbury commented 5 years ago

They're not very different. As I understand it, models where the AIC values have a difference of less than two from the model with the lowest AIC value (the one my code is identifying as the 'best' model, can be considered "competitive", and then other methods can be used to select the best model from these. For R_root, the difference between the best model and the next best is 0.14 (so very little difference) and for BNPP_root, it is 2.14 (so this one is a little more significant).

teixeirak commented 5 years ago

maybe just remove it completely, but if you have strong feelings about including it, I'd be fine with either way.

beckybanbury commented 5 years ago

@teixeirak are you in the office at all this week? I'd find it helpful to chat through the multivariate model selection methods I'm using; there's a couple of concerns I have that I'd like your opinion on!

beckybanbury commented 5 years ago

Effect_of_WaterStressMonths_MATURE_only_poly_all

I derived a measure of water stress months (months where PET > MAP), and above are the results for that. Again, the results aren't very significant; do you still think we should use this as one of the four variables to include in the multivariate model?

Part of the reason they might not be significant is that the data I used for PET and MAP are from different datasets (CRU and WorldClim respectively). I did this because this was the data I already had downloaded; however CRU does have it's own MAP dataset, so I could download that and derive water stress using that if you think it would be worthwhile.

teixeirak commented 5 years ago

Thanks for trying this!

I woudn't necessarily expect it to be a great predictor on its own, as the major axis of variation will always be latitude. Is it very useful as a second/ interactive term?

I think that the current approach is okay for getting a sense of how well it will do as a variable. If we decide its worth including, we should get the CRU MAP data because (1) the preferred order of operations would be to calculate n months for every year, then take the mean, (2) better to use the same data source.

teixeirak commented 5 years ago

You might compare this to a couple other variables that get at aridity (e.g., max VPD) to see which explains most variation.

beckybanbury commented 5 years ago

response	mod	significant	p.value	sample.size	Rsq.R2m	Rsq.R2c
GPP	mean ~ map*AnnualFrostDays+(1\|geographic.area/plot.name)	TRUE	1.13E-14	226	0.6579	0.9701
NPP	mean ~ mat*WaterStressMonths+(1\|geographic.area/plot.name)	TRUE	2.14E-12	95	0.6268	0.8194
BNPP_root	mean ~ AnnualFrostDays+(1\|geographic.area/plot.name)	TRUE	2.92E-05	84	0.2834	0.7789
BNPP_root_fine	mean ~ AnnualFrostDays+(1\|geographic.area/plot.name)	TRUE	0.00713	78	0.1355	0.6918
ANPP	mean ~ mat+map+(1\|geographic.area/plot.name)	TRUE	5.92E-20	191	0.5137	0.8564
ANPP_foliage	mean ~ mat+(1\|geographic.area/plot.name)	TRUE	2.60E-13	67	0.6692	0.8175
ANPP_woody_stem	mean ~ mat+WaterStressMonths+(1\|geographic.area/plot.name)	TRUE	2.35E-11	221	0.273	0.899
R_auto	mean ~ mat+(1\|geographic.area/plot.name)	TRUE	2.51E-05	14	0.7686	0.9532
R_auto_root	mean ~ mat*map+(1\|geographic.area/plot.name)	TRUE	0.000964	46	0.4297	0.8532

It comes out fairly well as a second predictor (see above). I tried various combinations of variables, and the above table is just running the models with MAP, MAT, annual frost days, and water stress months. There was nothing significant for BNPP using any of the combinations. For ANPP_foliage map+mat is approximately as good as mat alone.

beckybanbury commented 5 years ago

mat_map_interaction

@teixeirak This is just a rough graph of the results, but I ran the MAT + MAP interaction models and graphed them and the results are potentially interesting. The graphs show productivity plotted against MAT, with the different lines representing different levels of MAP (see legend for MAP in mm). It looks like BNPP + BNPP_root show a different interaction to GPP/ANPP/ANPP_woody_stem. NPP and ANPP_foliage don't show an interactive effect.

teixeirak commented 5 years ago

It comes out fairly well as a second predictor (see above). I tried various combinations of variables, and the above table is just running the models with MAP, MAT, annual frost days, and water stress months. There was nothing significant for BNPP using any of the combinations. For ANPP_foliage map+mat is approximately as good as mat alone.

I think this set of variables works. Are you happy with it?

teixeirak commented 5 years ago

@teixeirak This is just a rough graph of the results, but I ran the MAT + MAP interaction models and graphed them and the results are potentially interesting. The graphs show productivity plotted against MAT, with the different lines representing different levels of MAP (see legend for MAP in mm). It looks like BNPP + BNPP_root show a different interaction to GPP/ANPP/ANPP_woody_stem. NPP and ANPP_foliage don't show an interactive effect.

I do find this interesting, although we wouldn't want to plot lines outside the ranges for which we have data. Cold climates would typically get <1000mm/yr, so showing lines for 3000mm/yr there is misleading/confusing.

teixeirak commented 5 years ago

My other concern with the plot above is that we want to keep the message simple and consistent. If we're presenting the four variables identified above as the drivers of interest, perhaps we would want this plot to show the best model for each variable?

beckybanbury commented 5 years ago

I do find this interesting, although we wouldn't want to plot lines outside the ranges for which we have data. Cold climates would typically get <1000mm/yr, so showing lines for 3000mm/yr there is misleading/confusing.

This is a good point; what do you think would be the best way to present this? If we present the best model for each variable, as you suggest, we would still run into this problem if we want to be able to present a visualisation of the interaction. Otherwise we could just present the coefficients.

I like the set of variables above, and I'm happy with them. I do also think it's interesting to just take MAP and MAT as the interacting variables, because it allows easier comparison across carbon fluxes. E.g. looking at the coefficients for the MAT*MAP interaction, the coefficient for MAP is negative for all fluxes, with the exception of BNPP + BNPP_root (as indicated in the graph above).

I agree that we want to keep it simple though! I think it depends what question we're trying to answer, so perhaps we need a clearer hypothesis for this section. I'll have a think about this!

teixeirak commented 5 years ago

This is a good point; what do you think would be the best way to present this? If we present the best model for each variable, as you suggest, we would still run into this problem if we want to be able to present a visualisation of the interaction

I'd restrict each line to where there are data. For example, stop the 3000 mm precip line at the temperature below which there are no sites with precip ≥3000. Of course, that's easy to say but more complicated to code.

beckybanbury commented 5 years ago

mat_map_interaction

Fyi, this is the graph output when I restrict the lines to where we have data (still for MAP + MAT; I'm working on MAT and WaterStressMonths now)

beckybanbury commented 5 years ago

mat_map_interaction

Here it is in colour

teixeirak commented 5 years ago

Nice! I think it would be nice to show the dots on a continuous scale. Alternatively, we'll want to be sure to specify how they are categorized. I take it the color is the upper limit of a precip bin?

beckybanbury commented 5 years ago

I take it the color is the upper limit of a precip bin?

Yes!

teixeirak commented 5 years ago

If a continuous scale is complicated, I think it would be more intuitive to have each color be the midpoint of a bin.

beckybanbury commented 5 years ago

The outputs for comparing the multivariate models are saved here.

This method runs all multivariate models which are MAT + a second climate variable, on the basis that MAT is our best predictor. The table includes the model for MAT*MAP for all carbon fluxes, so that this can be compared against the 8 best combinations for each flux.

For most carbon fluxes, there aren't any other combinations of variables that significantly improve on MATMAP. The exceptions are NPP (best model is MATwater stress months), and ANPP (best model is MAT+/*potential evapotranspiration). Have a look and see what you think!

beckybanbury commented 5 years ago

Worth noting that for NPP MAT*water stress months is the only model that improves on MAT alone as a baseline model

teixeirak commented 5 years ago

Overall, looks good and I think this is a reasonable way to go.

I'm confused about NPP MAT*water stress months, though. From your table it looks like that's only a small improvement ∆AIC=0.5 over MAT alone.

beckybanbury commented 5 years ago

I'm confused about NPP MAT*water stress months, though. From your table it looks like that's only a small improvement ∆AIC=0.5 over MAT alone.

You're correct, I had overlooked that!

So I think we're concluding that MAT*MAP shows a significant interaction for the majority of fluxes, and, while there are multiple other climate variables that also show a significant interaction with MAT, none of them improve significantly on this. Therefore we just present results for MAT x MAP interaction, as these are our highest quality datasets, and they make strong ecological sense as predictors?

teixeirak commented 5 years ago

Agreed.

beckybanbury commented 5 years ago

mat_map_interaction

In terms of explaining the above graph, is the following along the right lines?:

For GPP, increases in precipitation increase productivity at all temperatures. At higher levels of precipitation, increases in temperature increase productivity at a higher rate than at lower levels of precipitation, suggesting that GPP is water limited? For ANPP, there is a positive relationship with temperature at low levels of precipitation. At high levels of precipitation, there is a stronger positive relationship, and productivity increases more rapidly per unit temperature. At lower temperatures, increases in precipitation decrease productivity (not water limited at low temperatures), but at higher temperatures, increases in precipitation increase productivity (water limited at high temperatures). For BNPP, there is a strong positive relationship with temperature at low levels of precipitation, and a weaker positive/no relationship at high levels of precipitation. At low temperatures, increases in precipitation increase productivity, but at high temperatures increases in precipitation decrease productivity.

teixeirak commented 5 years ago

For GPP, increases in precipitation increase productivity at all temperatures. At higher levels of precipitation, increases in temperature increase productivity at a higher rate than at lower levels of precipitation, suggesting that GPP is water limited?

Hmmm... yes, that seems to be the case --at least in this data yet. I've tended to think of GPP as more light-limited at the higher end of the precip spectrum, and there is evidence that this is the case at least sometimes, but at least with this data set it appears to increase even at higher precipitation (as opposed to other fluxes which decline). Something worth including in the discussion.

teixeirak commented 5 years ago

For ANPP, there is a positive relationship with temperature at low levels of precipitation. At high levels of precipitation, there is a stronger positive relationship, and productivity increases more rapidly per unit temperature. At lower temperatures, increases in precipitation decrease productivity (not water limited at low temperatures), but at higher temperatures, increases in precipitation increase productivity (water limited at high temperatures).

Mostly. I'd point out that the ANPP interaction is driven by woody productivity, which has the opposite interaction from BNPP (back to that later). Precip seems to have little impact on ANPP at warmer temperatures (and higher precip). Actually, looking at combined_plots.jpg, it looks like it becomes negative. This is not true of ANPP_stem, which does seem to be positively affected by precip in the tropics (although its hard to tell--that one is really all over the place!).

teixeirak commented 5 years ago

For BNPP, there is a strong positive relationship with temperature at low levels of precipitation, and a weaker positive/no relationship at high levels of precipitation. At low temperatures, increases in precipitation increase productivity, but at high temperatures increases in precipitation decrease productivity.

One important thing to note is that BNPP_root is linked to ANPP_woody in that BNPP_root_coarse would typically be estimated using an allometry, as with ANPP_woody_stem. Thus, we'd expect BNPP_root to show a pattern intermediate between those of ANPP_woody and BNPP_root_fine, which in fact we do see. It's closer to BNPP_root_fine. Looking at our ForC shiny app, it looks like BNPP_root_fine tends to be quite a bit larger than BNPP_root_coarse, so it makes sense that the pattern is closer to BNPP_root_fine.

I'm not so familiar with the literature on fine root productivity and climate. That's something we should look into. It does appear true (and reasonable) that fine root productivity decreases with precipitation at warmer temperatures. Keep in mind that warmer temperatures are also associated with the highest precipitation levels, so it may be that the extra-tropical sites are not generally hitting a limit where additional precipitation would have a negative effect on productivity, whereas tropical sites are. Also keep in mind that root productivity interacts with nutrients. Again, I feel a need to get a better handle on the literature.

teixeirak commented 4 years ago

There's some content here that's potentially worthwhile for discussion, but I think our discussion is pretty close to where we want it, so closing this.

forc-db / Global_Productivity

multivariate analysis #31