Could artefacts be influencing the comparisons of r2 among different independent variables, or among different dependent variables? Potential tests to check for this.

hmullerlandau commented 4 years ago

One of the key types of results interpreted here is the relative explanatory value (r2) compared among independent variables for a given dependent variable, or among dependent variables for a given independent variable. I’m concerned about several possible artefactual sorts of reasons that could influence the relative r2 values for different independent variables.

First, models with more parameters always have higher r2 values, so polynomial models may outperform linear models for that reason alone. Are the r2 values here (especially table S2) all “adjusted r2” values? They should be.

Second, the issue that some climate variables are real values for variables and sites and some are from Worldclim seems problematic when comparing fits among independent variables. I recommend repeating analyses using worldclim data for all variables in all sites, to check that rankings are qualitatively the same. If so, this can simply be stated briefly in the text, and another version of Table S2 using the all-world-clim data can be included for comparison. If the results differ in important ways, then some additional figures in SI may be warranted as well, the implications for interpreting the results need to be carefully considered, and this issue needs to be acknowledged and discussed at appropriate length within the main text.

Third, I kept wondering about whether differences in explanatory value (r2) for relationships for different carbon fluxes (e.g., ANPP vs. NPP) related to differences in which sites were included, rather than differences in the biological processes. After all, it’s clear from the figures that different numbers and types of data points are available for different relationships (e.g., the x axis range varies across FACF). So, is the r2 better for GPP because there are few flux sites in the tropics, for example? I would like to see this addressed, at the least with text in the discussion, and preferably also with some analyses.

One approach would be to repeat the analyses using only sites that have data for ALL the focal fluxes. But I expect that there are very few sites that have data for all these fluxes, and of course fewer data mean less strong conclusions as well. Still, it would provide a simple means to test if the ranking of explanatory variables remains roughly the same. This could mean including, a version of table S2 based on using just sites with all variables, and comparison of this table with the current S2.

A potentially better approach, but one that would be more work, would be to evaluate the robustness of the rankings for every pair of dependent variables (or every pair whose comparison we care about), by repeating the analyses just for sites that have values for both those variables. This would require a new table or tables to present the results. Perhaps one table per independent variable of interest and one row per combination of dependent variable, with columns for results under the analyses with all sites for each dependent variable (different sets of sites), columns with results for analyses with just sites in common for both dependent variables). To keep things manageable (and digestible by readers), perhaps this could be done just for independent variables latitude and MAT, and for comparisons of fluxes with their components (e.g., GPP and NPP, NPP and ANPP, ANPP and ANPPstem, ANPP and ANPPfoliage, NPP and BPP, etc.)

teixeirak commented 4 years ago

First, models with more parameters always have higher r2 values, so polynomial models may outperform linear models for that reason alone.

I believe the criteria for evaluating model performance was always ∆AIC, correct @beckybanbury ?

teixeirak commented 4 years ago

Second, the issue that some climate variables are real values for variables and sites and some are from Worldclim seems problematic when comparing fits among independent variables. I recommend repeating analyses using worldclim data for all variables in all sites, to check that rankings are qualitatively the same. If so, this can simply be stated briefly in the text, and another version of Table S2 using the all-world-clim data can be included for comparison. If the results differ in important ways, then some additional figures in SI may be warranted as well, the implications for interpreting the results need to be carefully considered, and this issue needs to be acknowledged and discussed at appropriate length within the main text.

We did try a comparison with WorldClim T and PPT (although I think we've dropped those variables from the analysis at this point), and the relationships can look pretty different because WorldClim/ CRU underestimate high precip. In the discussion we currently say this:

To me, this is probably sufficient. It's important to acknowledge the limitations of the climate data, but I don't see a different analysis doing much to change the messages in the paper-- except that local precip data may be more likely to give a polynomial fit, vs lognormal for WorldClim (issue #76).

I don't see a deep dive into ranking of the climate variables as explanatory variables as very useful, given that all of them are just crude metrics representing complex variation in climate across sites, and we therefore don't expect any of them to really capture the influence of climate on FACF.

teixeirak commented 4 years ago

Third, I kept wondering about whether differences in explanatory value (r2) for relationships for different carbon fluxes (e.g., ANPP vs. NPP) related to differences in which sites were included, rather than differences in the biological processes.

This undoubtedly has an influence.

One approach would be to repeat the analyses using only sites that have data for ALL the focal fluxes.

Unfortunately, there would be very few sites with all the focal fluxes.

Thus, I think this is just a limitation we have to live with-- be sure to acknowledge somewhere in the discussion.

teixeirak commented 4 years ago

Registering @bpbond's comment here:

Finally, I’d echo her notes that using R2 as a comparison metric should be done with extreme caution. The ‘relaimpo’ package in R is one alternative.

beckybanbury commented 4 years ago

I believe the criteria for evaluating model performance was always ∆AIC, correct?

Yes, we've been using deltaAIC to identify the best model, and then testing for significance between the best model and null model using an ANOVA test

The way I have been testing for the best model is to specify the null, linear and polynomial models, and then rank these based on AIC criteria. If the polynomial model ranks best, I test that it is 'significantly' better than the linear model, using deltaAIC. If the difference is <2, I select the linear model as the best model. Then I repeat this to test the deltaAIC between the linear and null model. This gives a stepwise method of considering the AIC values which should make sure the best model is best relative to all other models. I did it stepwise so that it means that each time we add a parameter, it's testing whether adding that parameter is really an improvement on the previous one.

I don't know if it's possible to calculate adjusted R2 for mixed models? The reading I did indicated that calculating R2 for mixed models is different to that of linear models because of the random effects - I used the MuMIn package which contains a function for estimating R2 of mixed models, and gives the marginal and conditional R2. These values aren't used to decide the best model though (that's just done using AIC), they're just reported to give an estimate of variation explained.

beckybanbury commented 4 years ago

I've just realised that I've misunderstood one of the problems here - we do use just the R2 values to distinguish between which climate variables are the most important predictors of a flux, and don't include any AIC analysis in that part.

I've had a very quick look at doing some pairwise comparisons - graphs are saved here, and have the R2 values on the graphs. The biggest difference seems to be the large increase in R2 values for NPP relative to GPP.

teixeirak commented 4 years ago

I'm confused... we always use AIC when we're looking at a single flux, correct?

The R2-only analysis is comparing across fluxes, but the same variable, right? That's what your new analysis addresses.

teixeirak commented 4 years ago

Looking at this new analysis, it seems fair to say that BNPP and ANPP_stem tend to be less well-explained by climate than the other flux variables. That's somewhat interesting, but I wonder if it's worth keeping it at all, considering the need for a bit of new analysis, presentation of more results, plus making the reader digest more info. Should we just drop the comparison of how much variation is explained for each flux? I just removed a statement on that from the abstract because @bpbond didn't think it belonged (and I agreed).

The main change (besides shortening the results) would be removing the following paragraph from the discussion:

One interesting observation was that climate tends to explain more variation in the major fluxes ($GPP$, $NPP$, $R{auto}$) than in subsidiary fluxes ($BNPP{fine.root}$, $R{root}$, $ANPP{stem}$) (Fig. 2; Table S2). There are two, non-exclusive, potential explanations for this. First, it may be that methodological variation is larger relative to flux magnitude for some of the subsidiary fluxes. Belowground fluxes in particular are difficult to quantify, and measurement methods for the belowground fluxes considered here may be measured through fundamentally different approaches (e.g., minirhizotrons, ingrowth cores, or sequential coring for $BNPP{fine.root}$; root exclusion, stable isotope tracking, or gas exchange of excised roots for $R{root}$), and sampling depth is variable and often insufficient to capture the full soil profile. $ANPP_{stem}$, which is also poorly explained by latitude or climate, is more straightforward to measure but is subject to variability introduced by differences such as biomass allometries applied and minimum plant size sampled (Clark et al. 2001 Ecological Applications). However, methodological variation and uncertainty affect all of fluxes considered here, and some of the larger fluxes that vary more strongly with respect to climate ($ANPP$, $NPP$) are estimated by summing uncertain component fluxes. Second, differences among variables in the proportion of variation explained by climate may be attributable to more direct climatic control over $GPP$ than subsidiary fluxes. That is, subsidiary fluxes may be shaped by climate both indirectly through its influence on $GPP$ and respiration and directly through any climatic influence on C allocation, as well as many other local- and regional-scale factors (REFS).

Again, this is somewhat interesting, but maybe not worth inclusion. What do you think, @beckybanbury , @hmullerlandau , @bpbond ?

hmullerlandau commented 4 years ago

In principle I really like the idea of doing these comparisons. The current result the climate explains more of the total fluxes than the component fluxes make sense and align with some previous thinking,e.g., pubs by Yadvinder Malhi. In practice, it seems like all these comparisons have to be done carefully, though. In addition to what is mentioned in the cited paragraph, I'm concerned that the comparisons could be influenced by (a) different degrees of freedom in best fit models for different fluxes (polynomial vs. linear models, for example), and (b) different sets of sites included for different fluxes. I think it would be possible to address these concerns, but I recognize that this is more work and that analysis fatigue may have set in. For (b), this could be done by running comparisons for every pair of response variables, using only sites in common to those two variables, and putting together a matrix of < or > based on that, as well as based on the current analyses, to check consistency. For (a), the simplest would be to just stick with models having the same number of parameters - linear and logarithmic, dropping polynomial for this purpose. I would say that if there is energy to address these two issues, then I'd be in favor of keeping them in the manuscript. But if such additional analyses aren't in the cards, then I would favor dropping these comparisons entirely.

teixeirak commented 4 years ago

sounds like your call, @beckybanbury. (Analysis fatigue would be totally understandable...)

beckybanbury commented 4 years ago

	GPP	NPP	larger Rsq	NPP	ANPP	larger Rsq	NPP	BNPP_root	larger Rsq	ANPP	ANPP_foliage	larger Rsq	ANPP	ANPP_woody_stem	larger Rsq
lat	0.6223	0.6645	NPP	0.5195	0.4807	NPP	0.4863	0.339	NPP	0.3226	0.453	ANPP_foliage	0.3525	0.1259	ANPP
mat	0.6165	0.7024	NPP	0.3024	0.4429	ANPP	0.409	0.2206	NPP	0.3638	0.495	ANPP_foliage	0.4238	0.168	ANPP
map	0.5236	0.242	GPP	0.009595	0.07615	ANPP	0.003022	0.1259	BNPP_root	0.02772	0.08815	ANPP_foliage	0.0535	0.0242	ANPP
PotentialEvapotranspiration	0.03422	0.08861	NPP	0.1456	0.2185	ANPP	0.1467	0.206	BNPP_root	0.1984	0.2541	ANPP_foliage	0.1654	0.1414	ANPP
VapourPressureDeficit	0.09224	0.1802	NPP	0.06643	0.1826	ANPP	0.1103	0.09471	NPP	0.2198	0.2764	ANPP_foliage	0.135	0.1011	ANPP
TempSeasonality	0.6478	0.6985	NPP	0.4661	0.4295	NPP	0.4901	0.4146	NPP	0.266	0.4233	ANPP_foliage	0.2943	0.0886	ANPP
length_growing_season	0.6502	0.4867	GPP	0.4122	0.3755	NPP	0.4039	0.2399	NPP	0.2251	0.3936	ANPP_foliage	0.2923	0.08331	ANPP

So a table something like this? (see: pairwise_comparisons.xlsx ). This compares R2 values for each pair of fluxes for sites only where both fluxes are measured, and using only null, linear and log models (polynomial removed). I wouldn't want to do too much more on this analysis than this, but if you think a table like this is worthwhile, it would be easy to tidy this up and include it.

beckybanbury commented 4 years ago

I'm confused... we always use AIC when we're looking at a single flux, correct? The R2-only analysis is comparing across fluxes, but the same variable, right? That's what your new analysis addresses.

When we're looking at a single flux I had used AIC to compare between null, linear, log and polynomial models for each flux and each climate variable, to identify the best model fit for each climate variable. However, I realised that when I'm comparing climate variables across a flux, I have just taken the R2 for the best model for each flux, and said e.g. 'R2 is highest for temperature, so temperature is the most important climate variable in explaining GPP', rather than comparing AIC values.

I think that this won't actually affect our results too much, because temperature is generally coming out best, which is usually a linear model, and so shouldn't be biased by the effect of different parameters, but does this mean I should run a second set of AIC comparisons, once I have identified the best model for each climate variable, to then rank the explanatory power of the climate variables for each flux? (Let me know if this doesn't make sense - there's several levels of analysis here which is somewhat confusing!)

teixeirak commented 4 years ago

So a table something like this? (see: pairwise_comparisons.xlsx ). This compares R2 values for each pair of fluxes for sites only where both fluxes are measured, and using only null, linear and log models (polynomial removed). I wouldn't want to do too much more on this analysis than this, but if you think a table like this is worthwhile, it would be easy to tidy this up and include it.

Sure, I think it would be valuable to put a table like this in the SI. Could we add R_root and R_auto?

As you code these, please pay attention to issue #55. This is a quick fix, but one that will need to be applied to all figures/tables.

teixeirak commented 4 years ago

I think that this won't actually affect our results too much, because temperature is generally coming out best, which is usually a linear model, and so shouldn't be biased by the effect of different parameters, but does this mean I should run a second set of AIC comparisons, once I have identified the best model for each climate variable, to then rank the explanatory power of the climate variables for each flux? (Let me know if this doesn't make sense - there's several levels of analysis here which is somewhat confusing!)

Yes, I think it would be good to do that.

beckybanbury commented 4 years ago

Could we add R_root and R_auto?

Sure, do you mean in comparison with each other, or in comparison with GPP?

teixeirak commented 4 years ago

R_auto in comparison with GPP, R_root in comparison with R_auto(?) and BNPP (or whatever makes sense to you).

hmullerlandau commented 4 years ago

Regarding the table - yes, something like that could work. It might be best to have a separate table for each type of comparison (e.g., GPP vs. NPP), in part to show the clear separation, and in part because it would be good to include a bit more information. In particular, I suggest including number of sites that have the data for both, and best type of model fitted for each case. You could drop a significant digit of the r2 for easier readability. And you might consider just highlighting the better r2 in bold rather than having a separate column for this, although I guess for automatically generating tables it may be easier to have the separate column. If you want to keep them in one wide table, then add more white space or double lines or something to make the division clear.

teixeirak commented 4 years ago

Another option could be to have a table like this (example data are made up):

C flux variable1	C flux variable 2	climate variable	n plots	model type v1	model type v2	R2 v1	R2 v2	variable with higher R2
GPP	NPP	MAT	60	linear	linear	.45	.5	NPP

beckybanbury commented 4 years ago

Yes, I think it would be good to do that.

@teixeirak AIC comparisons for the best model for each climate flux are here

teixeirak commented 4 years ago

So, interestingly, TempSeasonality comes out on top for 6, MAT for 3. (This relates to issue #89--need to be clear exactly what this is and get the units right.)

beckybanbury commented 4 years ago

R_auto in comparison with GPP, R_root in comparison with R_auto(?) and BNPP (or whatever makes sense to you).

There aren't enough points to do R auto and R root together (2 sites) but I'm doing R auto and GPP

teixeirak commented 4 years ago

Okay. I'd try to find some variable with which you could pair R_root --whatever has enough site overlap, preferably BNPP or GPP

beckybanbury commented 4 years ago

We have 7 sites when combined with GPP and 9 sites combined with BNPP - not a great sample size!

teixeirak commented 4 years ago

True. Perhaps list the BNPP-R_root pair in the table (without R2 results) just to show that we didn't forget the variable.

beckybanbury commented 4 years ago

some individual tables, as @hmullerlandau suggested, with the extra information, are saved here: https://github.com/forc-db/Global_Productivity/tree/master/results/tables/best_model_outputs/pairwise_comparisons

I think individual tables are more readable than a big table (a large table would be ~50 rows to get all the information in), unless we limited it to e.g. the three main climate variables for each pair of fluxes.

I'm a little worried that this is becoming a lot of SI tables, especially if we're including the AIC tables for climate model comparisons.

teixeirak commented 4 years ago

I'd limit it to lat, MAT, and T seasonality, and keep it all in one table, even if that table has to be continued across >1page. I agree that presenting all of these would be getting to be a lot. Another option would be to just state the results in the main text and mention that full results may be viewed on GitHub.

beckybanbury commented 4 years ago

Here's an updated shorter table suitable for SI - I think you're right that we can just state full results are available on GitHub if needed. Note that the number of plots is so low for respiration variables that the models aren't significant.

C flux variable 1	C flux variable 2	Climate variable	Rsq variable 1	Rsq variable 2	Model type variable 1	Model type variable 2	Number of plots	Variable with higher Rsq
GPP	NPP	Latitude	0.62	0.66	Linear	Linear	37	NPP
GPP	NPP	MAT	0.62	0.7	Logarithmic	Linear	37	NPP
GPP	NPP	T Seas	0.65	0.7	Logarithmic	Logarithmic	37	NPP
NPP	ANPP	Latitude	0.52	0.48	Logarithmic	Logarithmic	158	NPP
NPP	ANPP	MAT	0.3	0.44	Logarithmic	Linear	158	ANPP
NPP	ANPP	T Seas	0.47	0.43	Linear	Linear	158	NPP
NPP	BNPP	Latitude	0.49	0.34	Logarithmic	Linear	116	NPP
NPP	BNPP	MAT	0.41	0.22	Logarithmic	Logarithmic	116	NPP
NPP	BNPP	T Seas	0.49	0.41	Logarithmic	Logarithmic	116	NPP
ANPP	ANPP foliage	Latitude	0.32	0.45	Logarithmic	Logarithmic	96	ANPP foliage
ANPP	ANPP foliage	MAT	0.36	0.5	Linear	Linear	96	ANPP foliage
ANPP	ANPP foliage	T Seas	0.27	0.42	Linear	Linear	96	ANPP foliage
ANPP	ANPP stem	Latitude	0.35	0.13	Linear	Linear	176	ANPP
ANPP	ANPP stem	MAT	0.42	0.17	Linear	Linear	176	ANPP
ANPP	ANPP stem	T Seas	0.29	0.089	Linear	Linear	176	ANPP
GPP	R auto	Latitude	0.64	0.34	Null	Null	11	GPP
GPP	R auto	MAT	0.69	0.34	Null	Null	11	GPP
GPP	R auto	T Seas	0.64	0.32	Null	Null	11	GPP
BNPP	R root	Latitude	0.013	0.39	Null	Null	9	R root
BNPP	R root	MAT	0.083	0.35	Null	Null	9	R root
BNPP	R root	T Seas	0.0093	0.63	Null	Null	9	R root

beckybanbury commented 4 years ago

Another option would be to just state the results in the main text and mention that full results may be viewed on GitHub.

@teixeirak this would most likely be easiest for the AIC model comparison tables, unless we wanted a table just of the best model for each carbon flux

teixeirak commented 4 years ago

Here's an updated shorter table suitable for SI - I think you're right that we can just state full results are available on GitHub if needed. Note that the number of plots is so low for respiration variables that the models aren't significant.

Looks good! Did you save it as a csv? Let's put it in the figs/ tables file with the manuscript for easy integration into SI.

teixeirak commented 4 years ago

@teixeirak this would most likely be easiest for the AIC model comparison tables, unless we wanted a table just of the best model for each carbon flux

I think it would be good to present the best model for each C flux, then refer to full results on GitHub.

beckybanbury commented 4 years ago

@teixeirak table presenting the best model (or models, where there are multiple with dAIC <=2 is here, and I have also knitted it into the appendix file. I'll work on adding the pairwise comparison table next

beckybanbury commented 4 years ago

Here's an updated shorter table suitable for SI - I think you're right that we can just state full results are available on GitHub if needed.

Pairwise comparison table is now also added to SI. Hopefully these additions have now addressed this issue - would be really helpful if you'd be able to have a look over at some point and check that you're happy with how we've dealt with this.

teixeirak commented 4 years ago

Thanks, @beckybanbury ! I think this solves the issue and will close it (but @hmullerlandau , feel free to re-open in case you have any concerns).

hmullerlandau commented 4 years ago

Looks good to me! Great work Becky.

forc-db / Global_Productivity

Could artefacts be influencing the comparisons of r2 among different independent variables, or among different dependent variables? Potential tests to check for this. #78