fix height calculations

teixeirak commented 5 years ago

@mcgregorian1, there's some bug in your height calculations. This graph shows max heights of ~2m! I suspect that the problem lies in units conversion (e.g., use of cm vs mm).

mcgregorian1 commented 5 years ago

Hmm. part of this I believe is me being confused about the function of the regression with respect to units. When I get regression equations for height from dbh in cm:

log(height) = intercept + slope(log(DBH))

And then I take exp(height), is height automatically in meters? Or is it in cm?

The numbers on the graph are the raw numbers I get when I take exp(height), so I'm assuming those are automatically meters (contradicting my label).

teixeirak commented 5 years ago

This equation is based on your regression, correct? I thought you were using a nonlinear fit, right? The units on predicted height will be the same as measured height, and you need to make sure that you feed in DBH in the same units that the equation is based on (ie., cm vs mm).

mcgregorian1 commented 5 years ago

My regression equations have been determined from log(dbh) plotted against log(height). But in those graphs, DBH was in mm before taking the log, and height was in meters before taking the log. Are you saying they both need to be in the same units before plotting to get the regression?

teixeirak commented 5 years ago

They do not need to be the same units; you simply need to be consistent with units when you are later using the equation to predict.

It would be more accurate to fit a nonlinear power function than a linear model on log-log data. I can explain why in person.

mcgregorian1 commented 5 years ago

Hm, ok. Will you be in tomorrow (Friday) to talk about this?

teixeirak commented 5 years ago

I expect to be. You can also ask Valentine.

mcgregorian1 commented 5 years ago

I spoke with Valentine, and I think we had talked about this already? The picture below shows the regression equations as we all determined would be best, after trying the power function plus quadratic and others (see #12, especially about 10 April).

I did have a problem with the units somewhere, which I'm in the process of fixing. I'll update the graph later today.

Here's the picture

mcgregorian1 commented 5 years ago

Hi @teixeirak

I've updated the height graph with the correct units, but there is a discrepancy I've noticed where our heights are mostly above 30m (for codominant and dominant) and an outlier at about 54m. This doesn't make sense to me, and I suspect the heights might be biased upward. Eastern deciduous forests in North America I believe uncommonly get above 30m, and even when looking at our plot from the top of the road going to Leach, you don't really see anomalously tall trees sticking up.

I've checked the data, and it seems that the heights.csv has heights going up to 47.7m, or 156.5 feet.

teixeirak commented 5 years ago

There are two issues here: (1) how much do we trust our height measurements? Height can be a tricky measurement to make, but I'm not aware of any reason why our data would be wrong. Looking back at the original data, there are 3 different researchers who independently measured heights >40m. Thus, I'd say we have pretty strong evidence that trees can get that tall in this plot. (2) do we have the right allometry? This is where the problem probably lies. What is the species of the tree predicted to be 54 m? My suspicion is that (a) the power fit doesn't quite capture the tendency for increases in height to slow down as dbh increases, and (b) the outlying individual has a dbh quite a bit bigger than that of the largest tree of that species for which we have height data. Solutions could include: (1) was the outlier tree ever measured in height? If so, you can solve the problem by applying a rule that if predicted height exceeds height measured on the individual in question, we use the measured height. (2) try a different equation for these fits (explore the literature and find a different functional form) (3) only apply the species-specific equations within the observed dbh range, and for larger trees apply a general equation for all species combined (4) cap predicted height at max observed for the species or the whole community (for the latter, if the prediction is greater than 47.7, assign that value) (5) go out and measure height on some large individuals to constrain the allometries for large tree sizes. (Ask Erika to show you how, ideally calibrate your measurements against instruments at known height on NEON tower). I think any of these would be acceptable. If you have time to get more data, #5 would help improve the outcome of any of the above potential solutions.

mcgregorian1 commented 5 years ago

Sounds good. I'm still nervous about fully trusting the height measurements. It's true we have 3 researchers getting heights >40m, but looking at online stats for these species,

cato = average 20-30m, we have one labeled as 40.6 cagl = 20-30m, in "extremely favorable sites" up to 40m, we have it labeled as 43.1 quru = 20-30m average caovl = 15-25m average, we have one labeled as 39.6 litu = 20-30m average, sometimes larger

It's worth mentioning that most of the time on websites when these larger trees are cited, they talk about DBH of >100cm or DBH in feet, of which we have very few individuals of in the plot.

I'll look through this more later

mcgregorian1 commented 5 years ago

I've tried looking at the power function (Height = intercept*(diameter^slope) from the log-log graph, and altering the overall regression equation (for all species) but neither of these are producing heights more in line with what I'd expect to see. I've also spoken with Sarah and she mentioned that their methodology in the field was a little difficult just because of how much growth there is (e.g. in deer exclosure).

I've made a list of the species from the SCBI_tree_heights.csv that are still alive plus their locations, so I think one day next week or a half day or something I can go check on some of these heights.

teixeirak commented 5 years ago

looking at online stats for these species,

what is the source?

teixeirak commented 5 years ago

I've made a list of the species from the SCBI_tree_heights.csv that are still alive plus their locations, so I think one day next week or a half day or something I can go check on some of these heights.

I like the idea of checking/ potentially getting some more measurements. Of course, you'll face the same challenges that others have. It would help to calibrate against a point of known height on the NEON tower, but trees are more difficult, and it would take a lot to override measurements of 3 previous researchers.

A better way to check, if possible, would be to find trees on which height was measured that have died and fallen within the past few years / aren't yet very decomposed. You could get very accurate measurements of their height. Actually, regardless of whether such trees were previously measured, this approach would give us some really good additional data.

mcgregorian1 commented 5 years ago

That's true. I know of a few fallen trees in the plot that fell recently, so potentially I can get some heights on those. And yes, overriding three researchers would be hard. I'm just nervous about this height data because, for example, Jonathan Thompson has recorded that a havi in quad 110 in 2012 had a height of almost 30 feet. As I remember the plot (and other havi's I've seen in Shenandoah), I've never seen a single specimen over 20 feet. I could be wrong, but this is what's making me want to double-check.

mcgregorian1 commented 5 years ago

looking at online stats for these species,

what is the source?

sorry about that. I made a small csv showing sources and common heights for trees plus DBH if they were mentioned. Most of these come from the USFS

mcgregorian1 commented 5 years ago

The original equation I had for all points was giving me 54m as a predicted height for a quve. This equation was based off the conglomerate regression using all the species for which we have enough data to make sp-specific equations.

After including all points (not just the species that we have enough data for), I got a different equation for the conglomerate, which then yields a high predicted height of 47m. This is slightly better, but still very tall.

This is just for me to remember.

Thompson did collect heights in plot, using an Impulse LR laser (different to current rangefinder).
Stovall et al 2018 did not ground-truth height measurements, only lidar and computer modelling, at least from what it says in paper.
Sarah Macey did ground-truthing plus lidar

teixeirak commented 5 years ago

Glad to hear it. I think you should go with these values, although of course it will be a good thing to go out and check the data.

Note: all the values in the file you're using are measured using traditional ground-based laser. Atticus Stovall made those measurements to compare with tLidar. Only Sarah's ground-based measurements are in the file.

mcgregorian1 commented 5 years ago

Ok, so Alyssa and I have gone out and gotten some heights, and the result is mixed.

We sampled a recently fallen tree that looked large, and got only about 31m.
The 44.7m-tall tree measured in 2015 (stovall) was from my measurements 37.2.
Similarly, a tree measured as 44.3 in 2015 (stovall) was ~ 38m (I couldn't get an accurate reading on it, but a tree close to it looked to be roughly the same height, and with measuring it I got 38m).

Other trees I was able to get heights for:

tag	sp	height.m
52056	quru	27.8
102348	qual	31.6
60551	qual	34
92238	quve	31
102319	quve	31 (dead on ground)
42090	litu	36.6
102332	litu	31.6
62224	litu	28.2
92466	litu	37.2
92518	litu	38

Sarah mentioned that she and Daniel tested the accuracy of the rangefinder on the NEON tower, and it was pretty spot-on, so I trust my measurements with it. It's worth noting Jonathan used something different in 2012 and I'm not sure what Atticus used in 2015.

Thoughts

Overall, it seems the site is favorable for tall trees, but these tend to occur in the upland areas (from what we saw). This is how I see things

our current height data lists 9 trees >40m tall (not all of them litu). I've just disproven 2 of these.
we have 28 trees 35<x<40 m. tall with a fair mix of species. I'd say this is probable but I can't fully put weight behind it.
my issue now, then, is I'm not sure which measurements I can trust and which ones I can't. As I put in the comment from yesterday, including more points in the regression plot gave me a better reading, but it still gives me 47m for a quve. Even though I've obtained common tree heights in that csv I made, some of those numbers are already disproven for our site from what I got for quve and qual.

Next Steps

Either

since I don't know which data points I can trust and which ones I can't, I should go get more heights data and use that as my only data. This would take a bit because it would be just me and more data collection.
or I stay with this data but somehow put a cap on tree heights. At this point, I don't know how to assign a number as the cap because of my point in # 6 above.
I'm going to make notes of this in a readme for the new manuals folder I made yesterday, also explaining how to use the Nikon ForestryPro since the manual itself isn't too helpful.
I think ideally, maybe an idea for the future, is that height data is collected concurrently with the ForestGEO census, and focusing on all trees >30cm dbh or something. This way, you have consistent height-dbh measurements and we could more reliably get accurate allometries.

teixeirak commented 5 years ago

Sarah mentioned that she and Daniel tested the accuracy of the rangefinder on the NEON tower, and it was pretty spot-on, so I trust my measurements with it. It's worth noting Jonathan used something different in 2012 and I'm not sure what Atticus used in 2015.

A note on this-- there's probably far larger uncertainty associated with the user than with the instrument. Jonathan's group and I think Atticus also used laser rangefinders, and error introduced by the instrument would probably be trivial next to the uncertainty introduced by how we're using them.

mcgregorian1 commented 5 years ago

An update on this, after updating the main SCBI heights.csv

According to Helene in her paper, the sine method is best using a laser rangefinder,
Thompson collected heights using the tangent method without laser, using angles/guesstimating where the base and tops of trees were if there wasn't a direct line of sight. (230 records)
Stovall collected heights using clinometer and tape, also noted they were pressed for time. (48 records)
Macey and I collected heights using sine method with laser. (28 records)

teixeirak commented 5 years ago

Where did you get the info on Thompson’s method? I’m pretty sure that Jenny McGarvey made those measurements with a laser, which she later used to make a few additional measurements for me (paper cited in that spreadsheet).

My recommendation is that you throw out of the analysis and/or remeasure any that look like unreasonable outliers to you. You might create an extra column in that spreadsheet indicating suspicious values.

mcgregorian1 commented 5 years ago

Thompson's method I got because in the supplementary information for your paper, it says that heights were obtained from an Impulse LR. I looked up Impulse, and the only one that calculates heights for you is the Impulse 200LR. I then checked the user manual, and noted the tangent method was used,

In that case, Jenny did use a laser but according to the user manual, the laser is only used to determine the distance between her and the tree on a horizontal plane. When getting the heights of the canopy and the base, you still point the rangefinder in that direction, but it doesn't actually use a laser for that; instead, it only measures the angle. If it did use the laser for the canopy and base, then it would have used the sine method.

I think my problem is I don't know which values look suspicious. So, for Atticus' very tall measurements, for example, in 2015 he measured them as 47.7 and 44.3. If you assume a couple meters growth from 2015 to 2019, then they should be 50m and 46m (in theory). From those, I got measurements of 37.2 and 38m, respectively, which are 13m and 8m difference, also respectively. 38 and 37 are in range with other litu, but I have no idea at what point Atticus' other measurements are reliable enough from 4 years ago that they would still represent heights today. Same thing with Thompson's.

I say this because in my mind, my regression equations are based on DBH's relationship with height, so we're randomly overestimating some trees' relationship, but those numbers are within the general range, then the equations will be affected but I won't catch them. At this point I'd say anything above 40m is suspicious but I simply can't say anything about everything below that.

Does that make sense?

I can go get some heights corresponding to Jonathan's measurements to see if they make sense (in his case from 7 years' difference).

teixeirak commented 5 years ago

It’s time to wrap this up. Let’s do this:

add Atticus’s TLS data to the analysis, favoring this over his manual measurements.
throw out the 3 values from Atticus that you’ve been mentioning as suspicious (if they’re not overridden by TLS)
generally trust the rest of the data
add your new measurements to the file

These measurements do have error, but I don’t see any reason to fundamentally distrust them as systematically biased or less reliable than is typical of height measurements. Note that errors can go both ways, and height of tall trees may also be underestimated, so systematically checking just the tallest trees may create a downward bias. If you want to check more, these should be selected based on dbh, not height.

mcgregorian1 commented 5 years ago

Hi @teixeirak

I went out to the field this morning before the all-staff to check some of Jonathan's trees (and Sarah's). The results are below. In general I noticed I was getting lower values than him despite the 7 year difference, so either I'm completely doing this wrong or I don't know what's happening.

Either way, though, I like your idea and I'll add those TLS measurements in. I'll also change the source of the 2012 measurements to be Jenny/Chris instead of Jonathan - is that ok? I'd still keep Jonathan's name maybe in the notes

teixeirak commented 5 years ago

I'll also change the source of the 2012 measurements to be Jenny/Chris instead of Jonathan - is that ok? I'd still keep Jonathan's name maybe in the notes

Please do. Use their full names and and put in parentheses Jonathan Thompson lab.

teixeirak commented 5 years ago

Thanks for getting these measurements!

So, the bottom line is that your measurements using the sine method are almost always lower than previous estimates.

I just looked at Helene's paper and noted this:

Unfortunately, this further reduced the clarity on this issue, and raises problems for combining the two methods in the same allometry, particularly with your focus on re-measuring the larger trees.

Let me think about what to do here, and I'd like to hear your ideas as well.

teixeirak commented 5 years ago

How many trees do we have with both TLS and manual measurements? It would be good to go some comparisons, treating TLS as the master.

Also, could you please make a plot of your measurements vs previous measurements and calculate average % difference?

mcgregorian1 commented 5 years ago

Oof I didn't remember that part from Helene's paper. I'll think about this more on Tuesday.

Yep I can do that.

Two questions:

Did you want all the height data sheets (Jonathan's, Atticus') in the ForestGEO repo? Currently I just put them in my own folder.
I don't have a file data source for Jenny's height measurements on the sap cluster. Do you know where that is or what it's called?

teixeirak commented 5 years ago

Yes, let's put those in the ForestGEO repo. Originally I wasn't going to do that, but I now see how being able to go back to them is useful.
Those should be in the Dryad file from my Functional Ecology paper

teixeirak commented 5 years ago

Oof I didn't remember that part from Helene's paper. I'll think about this more on Monday.

It would be good if we can do those comparisons.

We should also consider what our conclusions would be if we assume 20% underestimation. Are you still getting systematically lower heights?

mcgregorian1 commented 5 years ago

Just from today, yes, most of my heights were systematically lower, but I should note they were lower than the 2012 measurements, even, which means they're more lower than what the heights would be now.

mcgregorian1 commented 5 years ago

How many trees do we have with both TLS and manual measurements? It would be good to go some comparisons, treating TLS as the master. Also, could you please make a plot of your measurements vs previous measurements and calculate average % difference?

For the measurements I have from Atticus, there are no individuals he got both manual and TLS measurements for.

Regarding my measurements, here is a basic graph showing the difference btwn different measurements for the same trees.

Here is a graph with just the measurements I've taken compared to others:

Here is a table showing the % difference for each stem I measured compared with other measurements (if 3 measurements, then the % diff is btwn mine and the most recent), and the difference in years. If height.dir = -1, that means my measurement was lower than the previous measurement, and vice versa.

the mean % difference of all my measurement comparisons together is 13.01
the mean % diff of the measurements where I measured lower than previous is 15.51
the mean % diff of measurements where I measured higher than previous is 6.58
the mean % diff only compared to Sarah's (from 2018, thus most like mine) is 13.41. However, Sarah also used the sine method, so really we should be getting the same measurements.

mcgregorian1 commented 5 years ago

Based on my graphs and the average % difference, I'm not sure how much I can take away from this considering most of the measurements I'm comparing against are 4, 6, or 7 years old. I'm not sure what the height growth per year is for these trees, but it's safe to say that my average % difference is definitely higher than 6.99 if we were using recent measurements.

I will do a regression analysis for all points combined (the whole heights.csv) and see what it looks like.

mcgregorian1 commented 5 years ago

I've run the regression equations. They provide better estimates than before, and taking out the manual measurements from Atticus doesn't change things much. However, I'm still getting one height >50m, plus a couple outliers still in the mid-40s range that shouldn't be there. I'm not sure of the scientific integrity of doing something like from before (see below).

Alternatively, this is where I just not include the outliers

The original equation I had for all points was giving me 54m as a predicted height for a quve. This equation was based off the conglomerate regression using all the species for which we have enough data to make sp-specific equations.

After including all points (not just the species that we have enough data for), I got a different equation for the conglomerate, which then yields a high predicted height of 47m. This is slightly better.

teixeirak commented 5 years ago

For species for which you do not have enough data to construct a species-specific allometry, you should definitely use an equation that includes all species.

teixeirak commented 5 years ago

@mcgregorian1, here's what I see as the best path forward here:

Accept that trees at this site tend to be taller than is typical. We now have multiple lines of evidence for this. You should still be critical of potential outliers, but we have several independent lines of evidence (J. Thompson's group, Atticus manual, TLS) that trees here tend to be tall. Your measurements are also consistent with this, assuming some bias in the measurement method (as documented by Helene).
Handling bias in the sine measurements-- We expect that sine measurements have bias, but we don’t know the amount. We could potentially apply a correction factor based on Helene's paper, but the bias may be different here. The best approach is probably to estimate the bias for our site. I'll open a new issue on this.
Set a bound on your predictions, where if the allometrically predicted height exceeds tallest measured tree, adjust down to tallest observed. That is, we won’t extrapolate the range of trusted height observations.
Do a sensitivity analysis to determine whether conclusions are sensitive to decisions about the height allometries. That is, run the main analysis using 2-3 different methods for determining height (e.g., treating as error any measurements >40m, excluding sine method, using just manual measurements, using just TLS-- whatever you feel is potentially influential/ decisions about which you feel uncertain). Do different data inclusion rules for developing height allometries result in fundamentally different conclusions (i.e., different sets of variables coming out as significant)? If not, we don’t worry so much about this issue.

mcgregorian1 commented 5 years ago

@teixeirak sounds good. I was thinking along the lines of # 3 earlier today. I can run the tests of # 4 soon.

I brought in some NEON data that I'd like to get your opinion on. These are height data from NEON plots at SCBI, from 11 surveys since 2015. I added the species, dbh, and height measurements to our data. As you can see, it actually makes the regressions worse in terms of R2 (there are many subcanopy heights), but overall it is more data than I had before.

The only downside is I can't seem to find exactly how they calculated height.

What do you think about including it given this?

teixeirak commented 5 years ago

At least in theory, it's great to include the NEON height data. However,

There are some values there that are clearly either errors or broken/fallen trees (e.g., ~50cm DBH QUVE with height ~3m), and we don't want to include those. I'm not sure if there's anything in the data to indicate abnormal status of a tree?
I'd only include one value per tree (this goes for our measurements too). You could either average all or select just one (e.g., most recent)

There should be documentation on the methods used to collect these. Have you looked here? If you can't find it online, please contact one of the NEON folks. Talking to someone may also be helpful for figuring out those outliers.

mcgregorian1 commented 5 years ago

Ah, you're right. I hadn't caught all that before. And thank you for finding that documentation for getting heights!

I checked through the NEON data again to see if there were duplicates, and it turns out there are randomly duplicates, with different DBH measurements in the same day but for the same height. There's no qualifier I can use to separate the two.

Based on this I'm wondering if it's worth it to keep the NEON data in at all (given the sporadicness of data errors). I took out the duplicates for SCBI measurements (using the most recent measurements).

With just our measurements (one per tree), the plots look like this:

With NEON's added (excluding dead), we get this. It's slightly better than before, but not as neat as our own.

mcgregorian1 commented 5 years ago

I also went out this morning to verify heights of the tallest predicted trees (the ones that with the regression equations, not actual measurements, heights were being predicted of 45m and 50.87m based on large DBH). Of the four trees I measured, I got heights of 23, 28.8, 29.6, and 32.8. I think basically, the regression equations don't simulate that when a tree reaches a certain height in our forest, it starts to expand more laterally without necessarily matching that vertically.

teixeirak commented 5 years ago

Regarding the NEON data, it would be nice to use, but discarding major outliers.

Regarding the allometries, I agree that they tend to overestimate the largest trees. Having more data for the largest trees (which you've done) should help to constrain that.

teixeirak commented 5 years ago

Which method does NEON use?

mcgregorian1 commented 5 years ago

That's true, but if we choose to not use the sine measurements then my constraints wouldn't be as useful. I'll check on recent papers for that.

NEON uses a rangefinder for heights, but I don't know which rangefinder they use. Ok, I can keep NEON data in. I checked with one NEON person about the duplicate data, but she said I needed to contact someone else, so I can do that today. neon protocol

teixeirak commented 5 years ago

Sounds good-- keep the NEON data in unless discussions with NEON folks reveal some reason why it can't be used.

mcgregorian1 commented 5 years ago

Ok, I got word back from NEON. There were several duplicates because of multi-stemmed individuals (almost all shrubs like libe). I got rid of duplicated, then got only one measurement per tree (the most recent year, in keeping with what I did for our measurements), and removed a couple obvious outliers.

With all that, I get regression equations like below I think this is the best I can do, so I'm going to call the height calculations done and use these equations for the model.

The only final fix i can think of is to not include fram or juni in this due to its sporadic measurements. Originally with just our data, we didn't include fram, juni, and quve because of lack of data. With the NEON data, now, we have enough for quve but fram and juni are questionable. What do you think?

mcgregorian1 commented 5 years ago

My final question is whether NEON data should be added to our repo? I don't think so, but I think I want to add a section to the table for it.

NEON gets height data using tangent method for taller trees. For smaller shrubs and "low stature vegetation", heights are obtained from a collapsible measurement rod.

teixeirak commented 5 years ago

I'd include FRAM and JUNI, but use an all-species regression for any individuals that fall outside the observed size range (e.g., a small JUNI).

teixeirak commented 5 years ago

My final question is whether NEON data should be added to our repo? I don't think so, but I think I want to add a section to the table for it.

Please do add it, including your flags for exclusion.

mcgregorian1 commented 5 years ago

My final question is whether NEON data should be added to our repo? I don't think so, but I think I want to add a section to the table for it.

Please do add it, including your flags for exclusion.

My follow up question then is do you only want to add the data I'm using (~20 MB), or the full data that I subset? NEON's full data is in a number of different csv files with necessary supplementary data in other csv files (including species information). If I add this to our repo, I'd have the NEON data all be in a separate folder.

mcgregorian1 commented 5 years ago

I'd include FRAM and JUNI, but use an all-species regression for any individuals that fall outside the observed size range (e.g., a small JUNI).

Sorry, I get the first part, but I'm confused what you mean by the second. The way I understand what I'm doing is I'm applying the juni equation to our cored juni, and the all-species regression to our species that don't have a specific equation.

Are you saying that when I apply the juni equation, if any of our cored juni have dbh and height outside of the range seen from the graphed ones, then they should be given the all-species equation?

Update: Using the new equations with the NEON data (and for juni, just using the juni equation without doing anything else) gives a graph like this, which looks much more like I'd expect.

SCBI-ForestGEO / McGregor_climate-sensitivity-variation

fix height calculations #25

Thoughts

Next Steps