vertical profiles figure

teixeirak commented 5 years ago

@mcgregorian1, I'm envisioning a single plot with about 8 panels (4 cols x 2 rows) showing (a-d) NEON vertical profiles, (e) crown positions, (f-h) trait averages (depending on which traits come out in top model).

add panels for:

[x] height by crown position in 2018 (order by height, not alphabetical):
[x] trait averages (can have more than one trait per plot)

general:

[x] across all plots, add a dashed horizontal line indicating the 95th percentile of height.
[x] make consistent y-axis
[x] condense by displaying only one copy of legend, remove repeated values on y-axis, etc.

for the NEON part:

[x] add standard deviations

mcgregorian1 commented 5 years ago

Also add mean traits as a function of TWI at the community level (across all the trees we have data for (conglomerated)

use all census data (since the leaf traits species are most of the biomass of the plot), and get TWI for each tree in census >10cm dbh

the relationship of the best leaf traits with TWI - like height profiles, see comparison of species with mean TLP, if there's tendency for less drought-tolerant species to be found in wetter areas

mcgregorian1 commented 5 years ago

Hi @teixeirak

I tried adding in standard deviation as error bars but the result is not ideal. Did you have a specific image in mind for how to add these?

teixeirak commented 5 years ago

Ack!! To start, color code them. I also think the graph would look much cleaner if the error bars are much narrower.

mcgregorian1 commented 5 years ago

I can easily change the colors. But to make the bars narrower changes how sd bars are usually presented.

For example, I have the min and max temperature for each over 3 years. I take sd for each of those (sd_max and sd_min) and get the average for each month at the same time. When I graph them, the errorbar represents:

minimum extent of bar <- temperature - sd_min maximum extent of bar <- temperature + sd_max

This is done for each point (max and min) for each month. I've seen a couple ways of doing this and they all calculate the error bars in this manner. Because my points are at the same height position, I also won't be able to avoid overlapping lines

teixeirak commented 5 years ago

"narrower" was a bad choice of word-- sorry for the confusion. I meant to make the ends of each bar (vertical part) less tall (narrower if you were to flip the graph 90 degrees)

mcgregorian1 commented 5 years ago

Ahh I see. That will yield you something like this:

teixeirak commented 5 years ago

much better! one thing that could make it a bit clearer could be to apply the R plotting function to slightly jitter the points (forget the name) so that the error bars aren't right on top of one another.

teixeirak commented 5 years ago

Also-- minor point-- the yellow is pretty hard to see. Could you please choose a different color? (The grey background is also pretty unconventional, but if the figure needs to be in color anyway it probably doesn't matter).

mcgregorian1 commented 5 years ago

I made the color be a dark orange, so hopefully that looks better for now.

As for jitter, I think because there's so much overlap, it makes the points look like they're completely random:

teixeirak commented 5 years ago

That jitter is definitely too much (which surprises me-- what I've seen before was very modest). I was also hoping to jitter the error bars more than the points themselves. Maybe just add 0.5 progressively to each month (for height) to give them a bit of offset?

mcgregorian1 commented 5 years ago

Ah I see, that makes sense. Making that correction gives me a graph like so:

teixeirak commented 5 years ago

That looks much better. Please be sure to note in the caption that heights are slightly offset for visualization purposes.

mcgregorian1 commented 5 years ago

@teixeirak did you want the 95th percentile height calculated from the field measurements we took or from the height data I'm using to make this graph (from regression equations)? This graph is for 2018 data.

teixeirak commented 5 years ago

@teixeirak did you want the 95th percentile height calculated from the field measurements we took or from the height data I'm using to make this graph (from regression equations)? This graph is for 2018 data.

Let's go with the height data used to make this graph. The idea is just to give a sense of the overall canopy height of the forest.

mcgregorian1 commented 5 years ago

Ok sounds good.

A note:

the graph above (height by crown position) is using only the cored trees (what I've been using in my full analysis), since we only got position data on those. I have the basic canopy-subcanopy split (35cmdbh), but I don't have it for all four positions. Did you want me to calculate that? Reason for doing so would be to have that same graph but representing all trees in census. Reason against would be that I'm using a small subset to figure that out.
The 95th percentile height calculated from only the cored trees is 35.0022 m. When I do that same calculation from the entire dataset, I get 34.26517 m. I'm guessing the latter is more accurate?
- given how close these are I'm glad to see the cored trees are a good representative subset in terms of height

The graphs below were made with all trees from the census data >10cm dbh.

Here is a sample graph for comparing TWI to TLP for the species we have TLP for (sorry the boxplots are so narrow). Based on this I'm not sure there's a full trend evident,

Here is also a graph of TWI compared to height (from regression equations)

I'm struggling with how to interpret these graphs, because I have a feeling there's a distinct trend happening, but all I can say for certainty is that this trend is dependent on the outliers. We can easily see that almost all species have a core value range of TWI , and in general higher TWI corresponds with shorter trees (which maybe doesn't make sense?). Thoughts?

teixeirak commented 5 years ago

* the graph above (height by crown position) is using only the cored trees (what I've been using in my full analysis), since we only got position data on those. I have the basic canopy-subcanopy split (35cmdbh), but I don't have it for all four positions. Did you want me to calculate that? Reason for doing so would be to have that same graph but representing all trees in census. Reason against would be that I'm using a small subset to figure that out.

no, please don't estimate canopy positions for trees on which it was not measured (not reliable). Just calculate 95th percentile based on the same set in the graph.

teixeirak commented 5 years ago

The 95th percentile height calculated from only the cored trees is 35.0022 m. When I do that same calculation from the entire dataset, I get 34.26517 m. I'm guessing the latter is more accurate?
* given how close these are I'm glad to see the cored trees are a good representative subset in terms of height

I agree, let's just go with the cored trees. This is only to get a fairly rough sense (reference line on a plot).

teixeirak commented 5 years ago

For the plot of traits vs TWI (or height), what I had in mind was a cross-species average TLP (assigning each individual based on species value) as a function of TWI (or height).

teixeirak commented 5 years ago

For the plot of traits vs TWI (or height), what I had in mind was a cross-species average TLP (assigning each individual based on species value) as a function of TWI (or height).

mcgregorian1 commented 5 years ago

I agree, let's just go with the cored trees. This is only to get a fairly rough sense (reference line on a plot).

Ok sounds good.

For the plot of traits vs TWI (or height), what I had in mind was a cross-species average TLP (assigning each individual based on species value) as a function of TWI (or height).

I'm not sure I fully understand. TLP is already an average value per species as far as I know; does my plot above not show TLP as an output of TWI?

mcgregorian1 commented 5 years ago

Ah I found this comment from you at the end of June from #33

Use census data (whole community), not just the set of trees for which we have cores.
Bin the trees in 5m (tentative- can adjust) increments of max height.
Assign TLP to each individual based on species mean.
Calculate mean and standard deviations of trait values across all trees in each height bin. Ignore NAs (species with no trait measurement), but record what percent of trees have no values. We should define some threshold (75% of trees with data??) below which we don't report/plot values.
make plots similar to those you made for the climate variables, but including SD (and add SD on those plots as well).

teixeirak commented 5 years ago

Good, that's still accurate.

mcgregorian1 commented 5 years ago

This is the result of those instructions, showing mean TLP with sd across height bins. The range of the y-axis is such that it covers the range of TLP values.

teixeirak commented 5 years ago

Good, except that the axes (labels) are flipped (TLP ranges -1.8 - -2.8 and height 5-50)

mcgregorian1 commented 5 years ago

Whoops, my bad.

teixeirak commented 5 years ago

Thanks! Error bars are SD, right?

Could you now flip both axes to parallel the other figures?

mcgregorian1 commented 5 years ago

Yep, those are the standard deviation.

This is the flipped version:

teixeirak commented 5 years ago

Looks good. Let's make the same figure with the other traits included in the analysis.

mcgregorian1 commented 5 years ago

Did you mean all traits or just the ones we're using to determine the best model for each scenario?

In other words, did you want me to include WD and LMA? In addition, because rp is categorical it wouldn't be able to be graphed in the same way (would be more of a bar graph distribution)

teixeirak commented 5 years ago

No, leave those out.

mcgregorian1 commented 5 years ago

Hi @teixeirak, I had finished this on Wednesday but I forgot to post it here.

These are all the graphs together. The reason I arranged them like this is because while I think it's best to have the NEON data together, the crown position does have the same y-axis but would look odd (in my opinion) in the same row. The average trait per height bin does have a y-axis in height but I can't have it be the same as the others due to the ranges (as opposed to set numbers).

Thoughts? I can put a title for the other two graphs as well, or at least put something like "a", "b", and "c".

teixeirak commented 5 years ago

Glad to see the progress! Here are some changes:

[x] condense crown position plot, add TLP and PLA next to it.
[x] remove TWI plot. We wouldn't expect much relationship there, so it's not really interesting. It would be interesting to test for any trends in TLP and PLA as a function of TWI (and perhaps you made this plot because of some miscommunication when I previously mentioned that?)
[ ] ensure that plot meets New Phyt formatting guidelines
[x] remove "height vs crown position" title
[x] order climate variables as windspeed, RH, T_air, T_biol

mcgregorian1 commented 5 years ago

[x] remove TWI plot. We wouldn't expect much relationship there, so it's not really interesting. It would be interesting to test for any trends in TLP and PLA as a function of TWI (and perhaps you made this plot because of some miscommunication when I previously mentioned that?)

I made a graph last week for this, but I wasn't sure this was what you were after:

teixeirak commented 5 years ago

I'd want to see one where the independent variable is TLP averaged across all individuals >10cm in the census. This would be the same as the height plot, but with TWI in pace of height as independent variable.

mcgregorian1 commented 5 years ago

I'm not sure I quite understand. TLP values are already mean values per species, so averaging over all individuals >10cm gives a single value of -2.1916.

The most I could do with that is have this graph, then put a dashed line at -2.19 showing the mean. Is this what you were thinking?

teixeirak commented 5 years ago

Use census data (whole community), not just the set of trees for which we have cores.
Bin the trees in increments of TWI.
Assign TLP to each individual based on species mean.
Calculate mean and standard deviations of trait values across all trees in each TWI bin. Ignore NAs (species with no trait measurement), but record what percent of trees have no values. We should define some threshold (75% of trees with data??) below which we don't report/plot values.

mcgregorian1 commented 5 years ago

Ah my bad. Since we have two of them, did you want them to be in on their own line or with the height graphs (leaving the crown position one alone in the middle)?

The numbers represent the % of trees with no values. I'm assuming then we want to take out the top bin?

teixeirak commented 5 years ago

Are those stems ≥10cm? I find it surprising that we have such low proportions of stems.

mcgregorian1 commented 5 years ago

Yes, these are stems >=10cm. I thought we wanted lower percentages here, though, since that means few bins have missing data

teixeirak commented 5 years ago

Oh, okay-- I was confused. I'm still a bit surprised that we don't have a larger proportion of stems.

I think it would be better to bin by ln[TWI]. That's what you use in the analysis, right?

mcgregorian1 commented 5 years ago

Nope, I use only TWI by itself in the analysis. Also, in the height graphs I've been binning by normal height as well, not the log.

The percentage values show the % of TLP or PLA values that are NA within each bin. Over the whole dataset I'm using, the percentage is 22% NAs for both (since we have those trait values for the same species)

mcgregorian1 commented 5 years ago

In addition, the 99th percentile of height is 41.01m, whereas the 95th percentile is 35.02m and the average of all the dominant trees is 32.06m. Did you want the line at the 99th then?

teixeirak commented 5 years ago

I'll let you make the call on 95th vs 99th. Either is acceptable/ correct so long as we accurately state what it is. I think that the 95th percentile makes sense ecologically, but relative to our graphs the 99th may make more sense.

How many trees total are we talking about? The plot of traits by height has 2 categories above 35, so I'm a little concerned that we're talking about extremely small sample sizes. You could consider binning logarithmically (but still plot on linear scale).

Note that the top NEON measurement is at the 99th percentile (if its actually that height and not pointing down a bit).

teixeirak commented 5 years ago

Nope, I use only TWI by itself in the analysis.

what happens if you use ln[TWI]?

mcgregorian1 commented 5 years ago

Nope, I use only TWI by itself in the analysis.

what happens if you use ln[TWI]?

The only thing that changes is that the coefficient for TWI is much stronger with the logged version. Otherwise, it's in the same models and the individual model components are the same. I'll keep log(TWI) in

teixeirak commented 5 years ago

Yes, that's better to log-transform that variable because there are so few very high values. Also, for this plot its probably better to bin logarithmically ( plotting on a linear axis would work, but log axis is probably better).

mcgregorian1 commented 5 years ago

I'll let you make the call on 95th vs 99th. Either is acceptable/ correct so long as we accurately state what it is. I think that the 95th percentile makes sense ecologically, but relative to our graphs the 99th may make more sense.

How many trees total are we talking about? The plot of traits by height has 2 categories above 35, so I'm a little concerned that we're talking about extremely small sample sizes. You could consider binning logarithmically (but still plot on linear scale).

Note that the top NEON measurement is at the 99th percentile (if its actually that height and not pointing down a bit).

Total trees I'm using for the graphs (>10cm) is 7890, in which case you're right, I just checked the and those two categories for height account for 14 trees.

I've altered the code such that it's producing bins of equal number of observations, not equal range sizes between the bins. I get this as a result (the numbers on the points are % no data for height). 2 questions:

with this I'd only be able to combine the two height graphs (heightxtlp and heightxpla) and the two TWI graphs since they would only have a common y-axis between themselves. I think it would be better if these were by themselves, maybe in a square format?
I've set the bins at 10, but I'm not sure if there's a particular reason for choosing something else like 7?

Here are the no_data % numbers for TWI bins:

teixeirak commented 5 years ago

For height, if you use 20 bins, the top bin would represent the 95th percentile, matching your line.

teixeirak commented 5 years ago

For all of these, let's plot on linear axes, putting the point at the median value for each bin. Let's have lines connecting the points, as you do with the other height profile figures.

teixeirak commented 5 years ago

Also, for TWI, please flip the axes.

SCBI-ForestGEO / McGregor_climate-sensitivity-variation

vertical profiles figure #37