forc-db / Global_Productivity

Creative Commons Attribution 4.0 International
2 stars 0 forks source link

which model should we consider the "master"? #22

Closed teixeirak closed 5 years ago

teixeirak commented 5 years ago

The current methods are: "Data from the temperate regions was heavily skewed towards studies from the old-growth forests of the Pacific Northwest. These forests have very high productivity, and so to reduce any bias from over-sampling of this region, the proportion of global forest cover contributed by each Koeppen climate zone was calculated, and the models weighted according these proportions"

Helene (@hmullerlandau) previously expressed concern that this kind of weighting: "I can see the motivation for weighting by forested area in a biome, but in terms of the mechanism and the statistical interpretation, I’m not sure it is the best approach. It is also a bit problematic in terms of interpreting the r2 value. In general r2 value is about explaining variability in the data, but if you are weighting by area represented, then what does it mean, really? More data means less uncertainty, and arguably values with less uncertainty should have a higher weighting, and that is all thrown out the window here – even if there are few data points for a particular region, if that region is large, those data points will have a lot of weight. At the very least I would recommend also doing the analysis with equal weighting across data points. If the results are qualitatively different between equal weighting and this weighting by forested area, then we would need to understand how and why they are different."

I think we should run and probably present both methods (whichever we determine to be preferable in the main paper, other in the SI).

beckybanbury commented 5 years ago

@teixeirak figures from the unweighted model are saved here. When we don't weight the model, it reduces the r-squared values, but doesn't appear to have a significant effect on the trends that we've been seeing. All the results are still significant.

The climate variables which are the best predictors have changed a little with the unweighted model, but it's broadly the same set of predictors that keep coming up.

If the r-squared values are the only major changes, and they are partly what Helene finds problematic about the weighted model, perhaps we should just take the unweighted model?

Have a look at the graph outputs and let me know what you think!

teixeirak commented 5 years ago

Could you please remind me which are the parallel weighted results?

In any case, based on your description I do think it makes sense to use the unweighted model. Let's treat that as the primary (probably only) model.

The variable that seems to be most affected by geographical sampling bias seems to be ANPP_woody, correct? I think it may make sense to just drop that one from the paper.

beckybanbury commented 5 years ago

Weighted model results are here.

ANPP_woody certainly has the smallest sample size and the lowest number of samples in temperate regions outside of the Pacific NW. It isn't included on any of the combined plots with latitude/climate variables, so it probably won't hurt to drop it. I did use ANPP_woody in the boxplot analyses, looking at whether allocation varies with climate, but I could also drop it there in favour of ANPP_woody_stem.

teixeirak commented 5 years ago

Let's drop ANPP_woody. For the boxplot, its mostly repetitive with ANPP_woody_stem, so I don't feel that it adds much.

teixeirak commented 5 years ago

Also, let's go with the unweighted model results.

beckybanbury commented 5 years ago

@teixeirak should I still describe the weighted model in the methods, and explain why it was rejected, or shall we just drop it entirely?

teixeirak commented 5 years ago

I'd envision potentially saying something like, "To ensure that results were not unduly influenced by geographical sampling bias, we tried a version of the model where data were weighted according to forested land area within each Koppen climate zone (Appedix S#). Results were similar and are not presented here." (or you could show a couple key figures/tables in the appendix.)