forc-db / Global_Productivity

Creative Commons Attribution 4.0 International
2 stars 0 forks source link

Ratio regression analyses #27

Closed beckybanbury closed 5 years ago

beckybanbury commented 5 years ago

@teixeirak I've run my regression model on three sets of ratios (calculated by pairing measurements taken at the same site and calculating the ratio): GPP:NPP, ANPP:BNPP, and ANPP foliage:ANPP woody. The results are here . Do you think these are interesting and worth including? If so, what do you think about the outliers in the graphs for foliage:woody and ANPP 2: BNPP? I've double checked the measurements against the citation source and they're correct, but they're pretty large ratio outliers.

teixeirak commented 5 years ago

I do think some of them are interesting, particularly ANPP:BNPP. For ANPP foliage:ANPP woody, there are a couple large outliers. Did you look into those?

What are the sites of the outliers of ANPP2: BNPP?

beckybanbury commented 5 years ago

For ANPP foliage: woody, the outliers are all from the same paper (Moser 2011). I checked the paper and the measurements were entered correctly. They're all tropical montane sites in Ecuador.

The ANPP2:BNPP outlier site is Kuusamo in Finland. The data came from the ORNL DAAC NPP database.

beckybanbury commented 5 years ago

@teixeirak I think that for this we should probably use the ANPP1:BNPP ratio, instead of ANPP2:BNPP. ANPP1 = ANPP foliage + ANPP woody stem, whereas ANPP2 also includes woody branch, making it equivalent to ANPP foliage + ANPP woody. Given that we've excluded ANPP woody elsewhere in the ratio comparisons in favour of woody stem, I think makes more sense to focus on the ANPP:BNPP ratio that also does that.

teixeirak commented 5 years ago

I agree.

@teixeirak I think that for this we should probably use the ANPP1:BNPP ratio, instead of ANPP2:BNPP. ANPP1 = ANPP foliage + ANPP woody stem, whereas ANPP2 also includes woody branch, making it equivalent to ANPP foliage + ANPP woody. Given that we've excluded ANPP woody elsewhere in the ratio comparisons in favour of woody stem, I think makes more sense to focus on the ANPP:BNPP ratio that also does that.

I agree.

teixeirak commented 5 years ago

For ANPP foliage: woody, the outliers are all from the same paper (Moser 2011). I checked the paper and the measurements were entered correctly. They're all tropical montane sites in Ecuador.

I looked at the paper. It looks like those very high ratios are all the high elevation sites. I think it would make sense to (a) include elevation as an independent variable or (b) limit to lowland forests?

I don't think we'll see a significant relationship here once those are accounted for.

teixeirak commented 5 years ago

I do think you should probably include this analysis (but see new issue #28), although I don't think they'll warrant a figure (maybe a small table). 2 of the 3 (excluding ANPP2:BNPP) do not come out significant, and ANPP:BNPP has fairly low R2. It may make sense to report results in a small table.

Please stick to only linear regression here.

beckybanbury commented 5 years ago

@teixeirak in terms of the ANPP foliage:woody outliers, I've tried removing them and re-running the analyses and when they're removed the regressions are all non-significant. The boxplot t-test comparisons remain significant with outliers removed. If I limit the data to elevation below 2000m, ANPP:BNPP and foliage:woody come out significant on the regression analyses but not the boxplots.

The regression model already includes altitude as a fixed effect and so should go some way to accounting for the effects of elevation.

I think overall it looks like there is no change in GPP:NPP, but BNPP allocation and foliage allocation are highest in the tropics, although the significance of this result varies depending on the analysis (e.g. regression v.s. boxplots) so this is a somewhat tenuous result. I can put a paragraph or two in the results describing this, and we can leave out the figures if you think that's best? I was going to include a boxplot figure but I can take it out.

teixeirak commented 5 years ago

I think overall it looks like there is no change in GPP:NPP, but BNPP allocation and foliage allocation are highest in the tropics

By extension, we'd expect ANPP_woody_stem: GPP and ANPP_woody_stem: NPP to be lowest in the tropics. Have you tested these?

teixeirak commented 5 years ago

I've tried removing them and re-running the analyses and when they're removed the regressions are all non-significant. The boxplot t-test comparisons remain significant with outliers removed. If I limit the data to elevation below 2000m, ANPP:BNPP and foliage:woody come out significant on the regression analyses but not the boxplots.

It sounds like the results are being strongly influenced by higher elevations. Is it the case that the regressions tend to be driven by high-elevation sites (including but not limited to the outliers)? Then when outliers are removed, the altitude term in the model accounts for most of that variation, whereas the boxplot test remains significant?

I believe one of those outliers was at elevation <2000m, correct? What happens if you use 1000 or 1500 as a cutoff?

teixeirak commented 5 years ago

I can put a paragraph or two in the results describing this, and we can leave out the figures if you think that's best? I was going to include a boxplot figure but I can take it out.

Let's finalize results and see how the figures look before deciding on this.

beckybanbury commented 5 years ago

It sounds like the results are being strongly influenced by higher elevations. Is it the case that the regressions tend to be driven by high-elevation sites (including but not limited to the outliers)? Then when outliers are removed, the altitude term in the model accounts for most of that variation, whereas the boxplot test remains significant?

I believe one of those outliers was at elevation <2000m, correct? What happens if you use 1000 or 1500 as a cutoff?

I think this is probably exactly what's happening. If I cut off at 1000/1500 m then ANPP:BNPP regression remains significant but the foliage:woody doesn't.

I don't find any significant relationship in woody_stem:GPP in any of the models, but where I cut off the data at 1000m there is a significant relationship between woody_stem:NPP + latitude, with woody allocation lowest in the tropics. There's no relationship between foliage:NPP and latitude.

teixeirak commented 5 years ago

Okay, thanks. I think it makes most sense to report on the relationships with latitude for all ratios with sufficient data. It probably makes most sense to put that in a table, as opposed to figures. If you think it adds anything interesting, we could also report relationships to climate variables in such a table, but I doubt it would add a whole lot.

By the way, I didn't see the figure for woody_stem:NPP + latitude in the repo. did you save it?

beckybanbury commented 5 years ago

@teixeirak do you know of any papers that look at carbon allocation with climate/across larger scales? All the papers I've found have been looking at local variation in allocation, and often focused on elevation/nutrients/stand age rather than climate. I wonder if allocation is so variable on smaller scales that it's hard to pick out trends at larger scales without accounting for all those factors?

teixeirak commented 5 years ago

The only one I can think of offhand that looks at allocation in forests worldwide is Litton et al. 2007, but from a quick review I don't see anything about relation to climate. That said, no one has used a database like ForC! In theory, we're accounting for elevation and stand age (somewhat), but global climate gradients clearly explain only a small portion of the variation (and we should be sure to note this in the paper).

beckybanbury commented 5 years ago

Okay, thanks. I think it makes most sense to report on the relationships with latitude for all ratios with sufficient data. It probably makes most sense to put that in a table, as opposed to figures. If you think it adds anything interesting, we could also report relationships to climate variables in such a table, but I doubt it would add a whole lot.

By the way, I didn't see the figure for woody_stem:NPP + latitude in the repo. did you save it?

Do you think I should use the regressions with altitude < 1000m then, in preference to the code that removes the extreme outliers? I don't hugely like removing outliers. Yesterday I also tried subsetting out outliers as determined by Cook's distance (points that have a disproportionate effect on the regression line), so that's also an option (this is currently included in the code).

I haven't saved that figure yet; I didn't want to push until we'd decided which model we wanted to go with to avoid confusion in the repo, as I've been running through so many iterations.

teixeirak commented 5 years ago

Yes, I think that makes sense. I don't see any reason to remove the points in question as erroneous, but we also don't want them having a disproportionate effect on an analysis that's focused on global-scale gradients.

beckybanbury commented 5 years ago

I've run the analysis with altitude over 1000m excluded with + without subsetting outliers as determined by Cook's distance. It looks like removing those outliers doesn't have a huge impact on the overall relationship, so Valentine suggests that we should keep them in (there's no reason to remove them). I've now pushed all the results to the repo so you can have a look. For latitude, the only significant relationships are ANPP:BNPP and ANPP:NPP.

teixeirak commented 5 years ago

There's one point where the ratio of ANPP_foliage/NPP >1. Is that an error in the data?

image

teixeirak commented 5 years ago

where I cut off the data at 1000m there is a significant relationship between woody_stem:NPP + latitude, with woody allocation lowest in the tropics

What happened to this relationship? You mentioned earlier it was significant, but didn't mention again, and I don't see the plot.

beckybanbury commented 5 years ago

What happened to this relationship? You mentioned earlier it was significant, but didn't mention again, and I don't see the plot.

So that relationship is significant if I run the cook's distance analysis on the data <1000m and then remove points with cook's distance > 4 times the mean (considered influential). I talked with Valentine and she suggested it would be better to not remove the points because removing them doesn't alter the shape of the relationships very much; if I don't remove them then the relationship isn't significant.

I can push the graphs with and without those cook's distance outliers removed if you like, so you can compare?

There's one point where the ratio of ANPP_foliage/NPP >1. Is that an error in the data?

I can't source the data for this point; it isn't in the paper recorded in the loaded.from, and the citation.id paper is in Chinese. It definitely seems wrong though! There's several other data points that seem to have been uploaded from this paper, so I wonder if we need to remove all of them?

teixeirak commented 5 years ago

I can't source the data for this point; it isn't in the paper recorded in the loaded.from, and the citation.id paper is in Chinese. It definitely seems wrong though! There's several other data points that seem to have been uploaded from this paper, so I wonder if we need to remove all of them?

Yes, please flag as suspicious (column at end) and add a note saying why.

teixeirak commented 5 years ago

So that relationship is significant if I run the cook's distance analysis on the data <1000m and then remove points with cook's distance > 4 times the mean (considered influential). I talked with Valentine and she suggested it would be better to not remove the points because removing them doesn't alter the shape of the relationships very much; if I don't remove them then the relationship isn't significant.

I can push the graphs with and without those cook's distance outliers removed if you like, so you can compare?

That would be great, or just post them here if you don't want to clutter the results.

beckybanbury commented 5 years ago

That would be great, or just post them here if you don't want to clutter the results.

plots are saved here

teixeirak commented 5 years ago

It may not matter, as it looks like the outlier that was removed is from the same study (at least the same latitude) as the suspicious value in question above.

If there are others removed because of high Cook's D, it would be good to know what they are/ check the data. If we're confident in the data, we can keep them in.

beckybanbury commented 5 years ago

@teixeirak following on from what we were discussing earlier, I've tried running these regressions in a few different ways: with ANPP_1 alone, with ANPP_2 alone, and with ANPP_1 and ANPP_2 combined. I've also tried matching additionally by date, in order to try to deal with the multiple matching problem. I'm getting very mixed results in terms of whether the regressions are significant or not.

I get significant results for ANPP_1:BNPP, but not for anything else.

This makes me think that the significance is very contingent on the specific dataset we're using, which makes me uncomfortable drawing any real conclusions from it.

teixeirak commented 5 years ago

Resolved.