forc-db / Global_Productivity

Creative Commons Attribution 4.0 International
2 stars 0 forks source link

Add PCA analysis results #29

Closed teixeirak closed 5 years ago

teixeirak commented 5 years ago

@beckybanbury, as discussed, let's include a table listing the variables, R2 for PC1, and loading factors/ coefficients for PC1.

beckybanbury commented 5 years ago

@teixeirak I've been looking at the PCA more closely, and I just wanted to clarify what it was you were hoping we could conclude from it. PC1 is the component that explains the most variation in the climate variables, but because it isn't computed in relation to the response variable values, it isn't necessarily the best predictor of the response variable/doesn't explain the most variation in the response variable. It's just a simplification of the climate variables. So for example, the r2 value of GPP against PC1 is 0.59, which isn't higher than the r2 of our best individual climate predictor. Is this still something you would want to include?

teixeirak commented 5 years ago

Right... I wasn't thinking carefully yesterday.

I think it would be valuable to include the joint of multiple climate variables, and the PCA analysis is one way to get at this. The challenge would be presenting it in a reasonably concise way. We would need to present both PC1 and PC2, and that's a lot of info. I think it could best be summarized in a table like this (where L and PC are loading and percent contribution):

Screen Shot 2019-07-16 at 7 14 19 AM

Another option would be to present the best multivariate model for each variable. That would be more straightforward.

Which option would you prefer?

beckybanbury commented 5 years ago
response.variable PC significant p.value Rsq.R2m Rsq.R2c
GPP 1 TRUE 2.76E-11 0.5929 0.9651
GPP 2 FALSE 0.2769 0.02701 0.9696
NPP 1 TRUE 7.97E-07 0.3257 0.8183
NPP 2 FALSE 0.4881 0.01416 0.8334
BNPP_root 1 TRUE 0.000474 0.2443 0.7778
BNPP_root 2 FALSE 0.7206 0.01088 0.79
BNPP_root_fine 1 TRUE 0.01604 0.1477 0.6787
BNPP_root_fine 2 FALSE 0.6655 0.01425 0.6992
ANPP 1 TRUE 3.54E-12 0.3635 0.8339
ANPP 2 TRUE 0.000367 0.1118 0.88
ANPP_foliage 1 TRUE 1.73E-09 0.5771 0.7752
ANPP_foliage 2 FALSE 0.2435 0.05071 0.8243
ANPP_woody_stem 1 TRUE 8.38E-09 0.2272 0.8772
ANPP_woody_stem 2 TRUE 0.000312 0.08717 0.8954
R_auto 1 TRUE 1.10E-05 0.8611 0.9577
R_auto 2 TRUE 0.01609 0.5422 0.9131
R_auto_root 1 TRUE 0.01884 0.2493 0.8372
R_auto_root 2 FALSE 0.6976 0.02759 0.8459

@teixeirak here are the r-squared and p-values for PC1 and PC2 for each variable. PC2 generally isn't a great predictor and is mostly not significant.

We could present the best multivariate model; the issue with that is that each variable has a different set of best predictors (and again, they aren't necessarily better than a single climate variable) so it could get complicated to present. The table output of running multivariate models is saved here, if you wanted to have a quick look; it's interesting to see which combinations are coming up as the best predictors. (e.g. this is one way of capturing the importance of water availability, despite those variables not being good predictors by themselves). Note that only mod.int (interactive model) and mod.add (additive model) use both fixed terms; if it is mod.linear or mod.poly then the model only includes fixed.1.

beckybanbury commented 5 years ago
  R-squared Mean annual temperature Mean annual precipitation Aridity Annual wet days Potential evapotranspiration Solar radiation Temperature seasonality Vapour pressure deficit Interactive effect
GPP 0.74 - 0.004 - - - - -0.086 - -0.0001
NPP 0.56 0.098 - -0.0001 - - - - - <0.0001
ANPP 0.49 0.238 - - - -0.002 - - - NA
ANPP foliage 0.66 0.133 - - - -0.001 - - - NA
ANPP woody stem 0.25 - - <0.0001 - - - - 0.76 0.0001
BNPP 0.36 - - - - - - -0.046 -1.347 0.056
BNPP fine root 0.19 - 0.002 - - - - - 4.722 -0.002
R auto 0.95 3.943 - - - - 0.0001 - - <0.0001
R root 0.47 - 0.0007 - - - - -0.006 - <0.0001

@teixeirak is a table like this what you had in mind?

teixeirak commented 5 years ago

Conceptually, this is great. Some minor adjustments:

At first I was confused about interactive effect, but then realized that each variable had at most two climate variables in the model.