Closed teixeirak closed 5 years ago
Hmm. part of this I believe is me being confused about the function of the regression with respect to units. When I get regression equations for height from dbh in cm:
log(height) = intercept + slope(log(DBH))
And then I take exp(height), is height automatically in meters? Or is it in cm?
The numbers on the graph are the raw numbers I get when I take exp(height), so I'm assuming those are automatically meters (contradicting my label).
This equation is based on your regression, correct? I thought you were using a nonlinear fit, right? The units on predicted height will be the same as measured height, and you need to make sure that you feed in DBH in the same units that the equation is based on (ie., cm vs mm).
My regression equations have been determined from log(dbh) plotted against log(height). But in those graphs, DBH was in mm before taking the log, and height was in meters before taking the log. Are you saying they both need to be in the same units before plotting to get the regression?
They do not need to be the same units; you simply need to be consistent with units when you are later using the equation to predict.
It would be more accurate to fit a nonlinear power function than a linear model on log-log data. I can explain why in person.
Hm, ok. Will you be in tomorrow (Friday) to talk about this?
I expect to be. You can also ask Valentine.
I spoke with Valentine, and I think we had talked about this already? The picture below shows the regression equations as we all determined would be best, after trying the power function plus quadratic and others (see #12, especially about 10 April).
I did have a problem with the units somewhere, which I'm in the process of fixing. I'll update the graph later today.
Here's the picture
Hi @teixeirak
I've updated the height graph with the correct units, but there is a discrepancy I've noticed where our heights are mostly above 30m (for codominant and dominant) and an outlier at about 54m. This doesn't make sense to me, and I suspect the heights might be biased upward. Eastern deciduous forests in North America I believe uncommonly get above 30m, and even when looking at our plot from the top of the road going to Leach, you don't really see anomalously tall trees sticking up.
I've checked the data, and it seems that the heights.csv has heights going up to 47.7m, or 156.5 feet.
There are two issues here: (1) how much do we trust our height measurements? Height can be a tricky measurement to make, but I'm not aware of any reason why our data would be wrong. Looking back at the original data, there are 3 different researchers who independently measured heights >40m. Thus, I'd say we have pretty strong evidence that trees can get that tall in this plot. (2) do we have the right allometry? This is where the problem probably lies. What is the species of the tree predicted to be 54 m? My suspicion is that (a) the power fit doesn't quite capture the tendency for increases in height to slow down as dbh increases, and (b) the outlying individual has a dbh quite a bit bigger than that of the largest tree of that species for which we have height data. Solutions could include: (1) was the outlier tree ever measured in height? If so, you can solve the problem by applying a rule that if predicted height exceeds height measured on the individual in question, we use the measured height. (2) try a different equation for these fits (explore the literature and find a different functional form) (3) only apply the species-specific equations within the observed dbh range, and for larger trees apply a general equation for all species combined (4) cap predicted height at max observed for the species or the whole community (for the latter, if the prediction is greater than 47.7, assign that value) (5) go out and measure height on some large individuals to constrain the allometries for large tree sizes. (Ask Erika to show you how, ideally calibrate your measurements against instruments at known height on NEON tower). I think any of these would be acceptable. If you have time to get more data, #5 would help improve the outcome of any of the above potential solutions.
Sounds good. I'm still nervous about fully trusting the height measurements. It's true we have 3 researchers getting heights >40m, but looking at online stats for these species,
cato = average 20-30m, we have one labeled as 40.6 cagl = 20-30m, in "extremely favorable sites" up to 40m, we have it labeled as 43.1 quru = 20-30m average caovl = 15-25m average, we have one labeled as 39.6 litu = 20-30m average, sometimes larger
It's worth mentioning that most of the time on websites when these larger trees are cited, they talk about DBH of >100cm or DBH in feet, of which we have very few individuals of in the plot.
I'll look through this more later
I've tried looking at the power function (Height = intercept*(diameter^slope) from the log-log graph, and altering the overall regression equation (for all species) but neither of these are producing heights more in line with what I'd expect to see. I've also spoken with Sarah and she mentioned that their methodology in the field was a little difficult just because of how much growth there is (e.g. in deer exclosure).
I've made a list of the species from the SCBI_tree_heights.csv that are still alive plus their locations, so I think one day next week or a half day or something I can go check on some of these heights.
looking at online stats for these species,
what is the source?
I've made a list of the species from the SCBI_tree_heights.csv that are still alive plus their locations, so I think one day next week or a half day or something I can go check on some of these heights.
I like the idea of checking/ potentially getting some more measurements. Of course, you'll face the same challenges that others have. It would help to calibrate against a point of known height on the NEON tower, but trees are more difficult, and it would take a lot to override measurements of 3 previous researchers.
A better way to check, if possible, would be to find trees on which height was measured that have died and fallen within the past few years / aren't yet very decomposed. You could get very accurate measurements of their height. Actually, regardless of whether such trees were previously measured, this approach would give us some really good additional data.
That's true. I know of a few fallen trees in the plot that fell recently, so potentially I can get some heights on those. And yes, overriding three researchers would be hard. I'm just nervous about this height data because, for example, Jonathan Thompson has recorded that a havi in quad 110 in 2012 had a height of almost 30 feet. As I remember the plot (and other havi's I've seen in Shenandoah), I've never seen a single specimen over 20 feet. I could be wrong, but this is what's making me want to double-check.
looking at online stats for these species,
what is the source?
sorry about that. I made a small csv showing sources and common heights for trees plus DBH if they were mentioned. Most of these come from the USFS
The original equation I had for all points was giving me 54m as a predicted height for a quve. This equation was based off the conglomerate regression using all the species for which we have enough data to make sp-specific equations.
After including all points (not just the species that we have enough data for), I got a different equation for the conglomerate, which then yields a high predicted height of 47m. This is slightly better, but still very tall.
This is just for me to remember.
Glad to hear it. I think you should go with these values, although of course it will be a good thing to go out and check the data.
Note: all the values in the file you're using are measured using traditional ground-based laser. Atticus Stovall made those measurements to compare with tLidar. Only Sarah's ground-based measurements are in the file.
Ok, so Alyssa and I have gone out and gotten some heights, and the result is mixed.
Other trees I was able to get heights for:
tag | sp | height.m |
---|---|---|
52056 | quru | 27.8 |
102348 | qual | 31.6 |
60551 | qual | 34 |
92238 | quve | 31 |
102319 | quve | 31 (dead on ground) |
42090 | litu | 36.6 |
102332 | litu | 31.6 |
62224 | litu | 28.2 |
92466 | litu | 37.2 |
92518 | litu | 38 |
Sarah mentioned that she and Daniel tested the accuracy of the rangefinder on the NEON tower, and it was pretty spot-on, so I trust my measurements with it. It's worth noting Jonathan used something different in 2012 and I'm not sure what Atticus used in 2015.
Overall, it seems the site is favorable for tall trees, but these tend to occur in the upland areas (from what we saw). This is how I see things
Either
Sarah mentioned that she and Daniel tested the accuracy of the rangefinder on the NEON tower, and it was pretty spot-on, so I trust my measurements with it. It's worth noting Jonathan used something different in 2012 and I'm not sure what Atticus used in 2015.
A note on this-- there's probably far larger uncertainty associated with the user than with the instrument. Jonathan's group and I think Atticus also used laser rangefinders, and error introduced by the instrument would probably be trivial next to the uncertainty introduced by how we're using them.
An update on this, after updating the main SCBI heights.csv
Where did you get the info on Thompson’s method? I’m pretty sure that Jenny McGarvey made those measurements with a laser, which she later used to make a few additional measurements for me (paper cited in that spreadsheet).
My recommendation is that you throw out of the analysis and/or remeasure any that look like unreasonable outliers to you. You might create an extra column in that spreadsheet indicating suspicious values.
Thompson's method I got because in the supplementary information for your paper, it says that heights were obtained from an Impulse LR. I looked up Impulse, and the only one that calculates heights for you is the Impulse 200LR. I then checked the user manual, and noted the tangent method was used,
In that case, Jenny did use a laser but according to the user manual, the laser is only used to determine the distance between her and the tree on a horizontal plane. When getting the heights of the canopy and the base, you still point the rangefinder in that direction, but it doesn't actually use a laser for that; instead, it only measures the angle. If it did use the laser for the canopy and base, then it would have used the sine method.
I think my problem is I don't know which values look suspicious. So, for Atticus' very tall measurements, for example, in 2015 he measured them as 47.7 and 44.3. If you assume a couple meters growth from 2015 to 2019, then they should be 50m and 46m (in theory). From those, I got measurements of 37.2 and 38m, respectively, which are 13m and 8m difference, also respectively. 38 and 37 are in range with other litu, but I have no idea at what point Atticus' other measurements are reliable enough from 4 years ago that they would still represent heights today. Same thing with Thompson's.
I say this because in my mind, my regression equations are based on DBH's relationship with height, so we're randomly overestimating some trees' relationship, but those numbers are within the general range, then the equations will be affected but I won't catch them. At this point I'd say anything above 40m is suspicious but I simply can't say anything about everything below that.
Does that make sense?
I can go get some heights corresponding to Jonathan's measurements to see if they make sense (in his case from 7 years' difference).
It’s time to wrap this up. Let’s do this:
These measurements do have error, but I don’t see any reason to fundamentally distrust them as systematically biased or less reliable than is typical of height measurements. Note that errors can go both ways, and height of tall trees may also be underestimated, so systematically checking just the tallest trees may create a downward bias. If you want to check more, these should be selected based on dbh, not height.
Hi @teixeirak
I went out to the field this morning before the all-staff to check some of Jonathan's trees (and Sarah's). The results are below. In general I noticed I was getting lower values than him despite the 7 year difference, so either I'm completely doing this wrong or I don't know what's happening.
Either way, though, I like your idea and I'll add those TLS measurements in. I'll also change the source of the 2012 measurements to be Jenny/Chris instead of Jonathan - is that ok? I'd still keep Jonathan's name maybe in the notes
I'll also change the source of the 2012 measurements to be Jenny/Chris instead of Jonathan - is that ok? I'd still keep Jonathan's name maybe in the notes
Please do. Use their full names and and put in parentheses Jonathan Thompson lab.
Thanks for getting these measurements!
So, the bottom line is that your measurements using the sine method are almost always lower than previous estimates.
I just looked at Helene's paper and noted this:
Unfortunately, this further reduced the clarity on this issue, and raises problems for combining the two methods in the same allometry, particularly with your focus on re-measuring the larger trees.
Let me think about what to do here, and I'd like to hear your ideas as well.
How many trees do we have with both TLS and manual measurements? It would be good to go some comparisons, treating TLS as the master.
Also, could you please make a plot of your measurements vs previous measurements and calculate average % difference?
Oof I didn't remember that part from Helene's paper. I'll think about this more on Tuesday.
Yep I can do that.
Two questions:
Oof I didn't remember that part from Helene's paper. I'll think about this more on Monday.
It would be good if we can do those comparisons.
We should also consider what our conclusions would be if we assume 20% underestimation. Are you still getting systematically lower heights?
Just from today, yes, most of my heights were systematically lower, but I should note they were lower than the 2012 measurements, even, which means they're more lower than what the heights would be now.
How many trees do we have with both TLS and manual measurements? It would be good to go some comparisons, treating TLS as the master. Also, could you please make a plot of your measurements vs previous measurements and calculate average % difference?
For the measurements I have from Atticus, there are no individuals he got both manual and TLS measurements for.
Regarding my measurements, here is a basic graph showing the difference btwn different measurements for the same trees.
Here is a graph with just the measurements I've taken compared to others:
Here is a table showing the % difference for each stem I measured compared with other measurements (if 3 measurements, then the % diff is btwn mine and the most recent), and the difference in years. If height.dir = -1, that means my measurement was lower than the previous measurement, and vice versa.
Based on my graphs and the average % difference, I'm not sure how much I can take away from this considering most of the measurements I'm comparing against are 4, 6, or 7 years old. I'm not sure what the height growth per year is for these trees, but it's safe to say that my average % difference is definitely higher than 6.99 if we were using recent measurements.
I will do a regression analysis for all points combined (the whole heights.csv) and see what it looks like.
I've run the regression equations. They provide better estimates than before, and taking out the manual measurements from Atticus doesn't change things much. However, I'm still getting one height >50m, plus a couple outliers still in the mid-40s range that shouldn't be there. I'm not sure of the scientific integrity of doing something like from before (see below).
Alternatively, this is where I just not include the outliers
The original equation I had for all points was giving me 54m as a predicted height for a quve. This equation was based off the conglomerate regression using all the species for which we have enough data to make sp-specific equations.
After including all points (not just the species that we have enough data for), I got a different equation for the conglomerate, which then yields a high predicted height of 47m. This is slightly better.
For species for which you do not have enough data to construct a species-specific allometry, you should definitely use an equation that includes all species.
@mcgregorian1, here's what I see as the best path forward here:
@teixeirak sounds good. I was thinking along the lines of # 3 earlier today. I can run the tests of # 4 soon.
I brought in some NEON data that I'd like to get your opinion on. These are height data from NEON plots at SCBI, from 11 surveys since 2015. I added the species, dbh, and height measurements to our data. As you can see, it actually makes the regressions worse in terms of R2 (there are many subcanopy heights), but overall it is more data than I had before.
The only downside is I can't seem to find exactly how they calculated height.
What do you think about including it given this?
At least in theory, it's great to include the NEON height data. However,
There should be documentation on the methods used to collect these. Have you looked here? If you can't find it online, please contact one of the NEON folks. Talking to someone may also be helpful for figuring out those outliers.
Ah, you're right. I hadn't caught all that before. And thank you for finding that documentation for getting heights!
I checked through the NEON data again to see if there were duplicates, and it turns out there are randomly duplicates, with different DBH measurements in the same day but for the same height. There's no qualifier I can use to separate the two.
Based on this I'm wondering if it's worth it to keep the NEON data in at all (given the sporadicness of data errors). I took out the duplicates for SCBI measurements (using the most recent measurements).
With just our measurements (one per tree), the plots look like this:
With NEON's added (excluding dead), we get this. It's slightly better than before, but not as neat as our own.
I also went out this morning to verify heights of the tallest predicted trees (the ones that with the regression equations, not actual measurements, heights were being predicted of 45m and 50.87m based on large DBH). Of the four trees I measured, I got heights of 23, 28.8, 29.6, and 32.8. I think basically, the regression equations don't simulate that when a tree reaches a certain height in our forest, it starts to expand more laterally without necessarily matching that vertically.
Regarding the NEON data, it would be nice to use, but discarding major outliers.
Regarding the allometries, I agree that they tend to overestimate the largest trees. Having more data for the largest trees (which you've done) should help to constrain that.
Which method does NEON use?
That's true, but if we choose to not use the sine measurements then my constraints wouldn't be as useful. I'll check on recent papers for that.
NEON uses a rangefinder for heights, but I don't know which rangefinder they use. Ok, I can keep NEON data in. I checked with one NEON person about the duplicate data, but she said I needed to contact someone else, so I can do that today. neon protocol
Sounds good-- keep the NEON data in unless discussions with NEON folks reveal some reason why it can't be used.
Ok, I got word back from NEON. There were several duplicates because of multi-stemmed individuals (almost all shrubs like libe). I got rid of duplicated, then got only one measurement per tree (the most recent year, in keeping with what I did for our measurements), and removed a couple obvious outliers.
With all that, I get regression equations like below I think this is the best I can do, so I'm going to call the height calculations done and use these equations for the model.
The only final fix i can think of is to not include fram or juni in this due to its sporadic measurements. Originally with just our data, we didn't include fram, juni, and quve because of lack of data. With the NEON data, now, we have enough for quve but fram and juni are questionable. What do you think?
My final question is whether NEON data should be added to our repo? I don't think so, but I think I want to add a section to the table for it.
NEON gets height data using tangent method for taller trees. For smaller shrubs and "low stature vegetation", heights are obtained from a collapsible measurement rod.
I'd include FRAM and JUNI, but use an all-species regression for any individuals that fall outside the observed size range (e.g., a small JUNI).
My final question is whether NEON data should be added to our repo? I don't think so, but I think I want to add a section to the table for it.
Please do add it, including your flags for exclusion.
My final question is whether NEON data should be added to our repo? I don't think so, but I think I want to add a section to the table for it.
Please do add it, including your flags for exclusion.
My follow up question then is do you only want to add the data I'm using (~20 MB), or the full data that I subset? NEON's full data is in a number of different csv files with necessary supplementary data in other csv files (including species information). If I add this to our repo, I'd have the NEON data all be in a separate folder.
I'd include FRAM and JUNI, but use an all-species regression for any individuals that fall outside the observed size range (e.g., a small JUNI).
Sorry, I get the first part, but I'm confused what you mean by the second. The way I understand what I'm doing is I'm applying the juni equation to our cored juni, and the all-species regression to our species that don't have a specific equation.
Are you saying that when I apply the juni equation, if any of our cored juni have dbh and height outside of the range seen from the graphed ones, then they should be given the all-species equation?
Update: Using the new equations with the NEON data (and for juni, just using the juni equation without doing anything else) gives a graph like this, which looks much more like I'd expect.
@mcgregorian1, there's some bug in your height calculations. This graph shows max heights of ~2m! I suspect that the problem lies in units conversion (e.g., use of cm vs mm).