consider visualizing regression model results

teixeirak commented 4 years ago

@mcgregorian1, I'm not necessarily advocating for this, but wonder if it may be helpful to visualize model results; for example, using the package visreg.

mcgregorian1 commented 4 years ago

hmm, we definitely could; I don't think there'd be anything lost by incorporating something like this. I think the question is whether we think we need more figures in the overall manuscript

teixeirak commented 4 years ago

Let's not worry about it. The highest priority for you right now would be to check my questions related to Table 4 in issue #62.

teixeirak commented 4 years ago

Reopening this based on reviewer comments. They are unanimous in suggesting this type of figure.

R1: "I suggest to show some relationship, e.g. between Rt and traits in figures. The study only presents two figures and there is plenty of room to present results more clearly with extra figures."

R2: "... use figures to show more specific results related to your individual tree analyses."

R3: "I also feel that it would be nice to represent some of the key results (e.g., Rt and its relationship to height and leaf hydraulic traits) in a figure."

R4: "L287-299: it would be very useful to use scatterplots to show the effects of traits on Rt in a figure."

teixeirak commented 4 years ago

I think we should definitely try making a figure like this--probably just for the 3 years combined*.

*R1 comment on focusing on combined model: "Following the two previous comments, I suggest to leave in Table 5 just the best model (rather that a multimodel inference as is presented now) to discuss more clearly the covariates included. And I suggest discussing only the model with all years together. This would be more robust and avoid ramble differences between years that mostly look spurious and with a difficult physiological explanation. This would help to ensure a better use of the Resilience indices."

mcgregorian1 commented 4 years ago

I tried out the visreg function - here are a couple sample plots. What specifically were you thinking of showing?

These are only for the best Rt model for all years together:

teixeirak commented 4 years ago

That looks good. Could we also make plots for height, canopy position? Could all years be combined on a single plot (diferent colors by drought year)? This would make it feasible to show more independent variables.

teixeirak commented 4 years ago

Back to this... The top main model has H, TWI, PLA, and TLP as variables. So... let's plot H, TWI (?*), PLA, and TLP for the all-year model, showing drought years in different colors.

*Not a variable of interest for hypothesis testing, so less important, but won't hurt to show if it fits nicely.

teixeirak commented 4 years ago

Then, maybe we just drop Tables 4 and 5 from the main manuscript? The figure would cover the most important findings, and this would be consistent with the suggestion of one of the reviewers (R1, I think) that we simplify the focus. If we do this, I'd be slightly more inclined to show TWI in the plot above.

mcgregorian1 commented 4 years ago

How's this? I can make it be taller than it is wide to emphasize that each line is trending down?

I'm going on a short walk, but I will be back soon.

teixeirak commented 4 years ago

Yes, you can try that. Alternatively, you might just show the regression line with confidence intervals (see example below). The large variability in tree-ring data tends to overwhelm the trends in plots like this.

mcgregorian1 commented 4 years ago

Do you mean 3 separate lines per year then?

teixeirak commented 4 years ago

Actually, what if you plot the regression lines + CIs for the top models for all years combined and for each year? Because CP comes out in the top models for 1977 and 1999, this would mean replacing the TWI panel with CP. You'd only plot the line when its in the top model, so height would show up in all but 1977, PLA would show up in all years + 1966, TLP in all years + 1977, etc.

mcgregorian1 commented 4 years ago

Still working on this. It's taking longer than I thought

mcgregorian1 commented 4 years ago

I am so sorry for how long this has taken. My advisor here hates ggplot and I believe I've found section 1 for a perfect example of that. Long story short is that you can not create a legend on your own in ggplot, it must be built from within the data itself, but the data must be in a very specific format.

By the time I realized it was useless, I had already created everything to make the plot, so this final image is a result of saving the legend as a separate png and then combining them.

Again, sorry about this.

teixeirak commented 4 years ago

Looks great in general; thanks. The one thing is that CP should be a categorical variable. You could also drop TWI, at your discretion. And, why is there a positive H trend for 1999? I don’t see that in the table.

teixeirak commented 4 years ago

By the way, Valentine hates gg-plot too. Sorry it’s been such a pain.

mcgregorian1 commented 4 years ago

I'll check on H in a few minutes. For CP I converted each of the categories into a number in order to get a similar output for the ggplot, and then relabeled the z-axis.

On Sun, 12 Jul 2020, 20:08 Kristina Anderson-Teixeira, < notifications@github.com> wrote:

By the way, Valentine hates gg-plot too. Sorry it’s been such a pain.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCBI-ForestGEO/McGregor_climate-sensitivity-variation/issues/71#issuecomment-657296971, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNRBEMXPFOKHWY5QDWJ6F3R3JF7XANCNFSM4KN6SW7Q .

teixeirak commented 4 years ago

Unfortunately, the CP plot won't work: it's modeling a linear response to a continuous variable, whereas CP is a categorical variable, and the response is not linear (consistently highest for C). I suspect that that's that's what's making H come out as a positive relationship for 1999.

teixeirak commented 4 years ago

(We can still send around to coauthors with this plot if you can't get it solved by tomorrow. And I can maybe find someone who could help figure it out.)

teixeirak commented 4 years ago

Also, a minor formatting thing: The figure is currently a funny shape for fitting on a journal page. I'd move TWI up to the first row, and have the two traits and legend in the second row.

mcgregorian1 commented 4 years ago

In theory, we can't actually show confidence intervals for mixed effects models; see this commentary page 11 from the creators of visreg.

On a related note, visreg is great but it does not allow for me to stack two different things on top of one another in the same plot (think, crown position for 1977 and 1999). Thus I've had to go to ggplot.

Otherwise you're right about height. With visreg, I get the following for 1999, which makes sense according to our tables:

When I've been trying to do this in ggplot, I've tried to emulate the method of visreg by making sure I'm plotting the variables against the fitted values for Rt (using lm for a smooth line). However, even with doing this, I'm still getting Rt increasing for height as you can see below, which is not correct. I think with this I've hit a wall. It seems ggplot is just not equipped to handling mixed effects models data (I've read this on multiple posts toni ght).

teixeirak commented 4 years ago

Ack, sorry to hear this is giving so much trouble! I'm flagging @ValentineHerr to see if she knows how to solve it, or otherwise find some clever way to get around it. (Valentine, briefly, the goal is to visualize the results of the top models from Ian's paper, but gg-plot is giving trouble...) Ian's message above tells more.

If Valentine doesn't know, I can check with a couple people outside the author team. Or, I can see if its possible in Matlab. We'll figure something out.

mcgregorian1 commented 4 years ago

Before I went to bed last night I did email the author of the visreg package, so he might get back to me soon

On Mon, Jul 13, 2020 at 7:29 AM Kristina Anderson-Teixeira < notifications@github.com> wrote:

Ack, sorry to hear this is giving so much trouble! I'm flagging @ValentineHerr https://github.com/ValentineHerr to see if she knows how to solve it, or otherwise find some clever way to get around it. (Valentine, briefly, the goal is to visualize the results of the top models from Ian's paper, but gg-plot is giving trouble...) Ian's message above tells more.

If Valentine doesn't know, I can check with a couple people outside the author team. Or, I can see if its possible in Matlab. We'll figure something out.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCBI-ForestGEO/McGregor_climate-sensitivity-variation/issues/71#issuecomment-657503351, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNRBEOYJHBSK3JSOQVRPN3R3LV23ANCNFSM4KN6SW7Q .

--

Ian McGregor

Ph.D. Student | Center for Geospatial Analytics

He/Him/His

College of Natural Resources

Jordan Hall 4120 | Campus Box 7106

North Carolina State University

2800 Faucette Dr.

Raleigh, NC 27695 USA imcgreg@ncsu.edu | 714-864-1005 | geospatial.ncsu.edu

teixeirak commented 4 years ago

Ian, I've added Albert Kim (@rudeboybert), who is a (future) sabbatical visitor in the lab, to this repo. I described this problem to him and he thinks he may be able to help.

mcgregorian1 commented 4 years ago

I got feedback from the author of the visreg package. His suggestion was good, but I still ran into errors so I'm keeping up the conversation.

rudeboybert commented 4 years ago

Hello, I cloned this repo and took a look at graphs_for_manuscript.R, however the code is a bit large for me to digest and I can't seem to reproduce the outputs (Ex: line 162 generates an error saying traits is not found).

Could someone:

Add a scripts/graphs_for_manuscript_albert.R that is a pared down/minimally reproducible version of graphs_for_manuscript.R generating the output?
I think I have an idea of what your vision for the plot is based on this thread, but just to be sure, could someone draw out the plot by hand and attach a picture to this thread?

Based on my initial interpretation of the ask, I think the following combination should do the trick:

broom.mixed package for lme4 objects
ggplot2::geom_ribbon()

mcgregorian1 commented 4 years ago

Hi @rudeboybert

Sorry you looked at the code just now; I made more changes last night and didn't finish cleaning it all up! The part of the code I have questions on starts at line 471.

What I'm running into is this:

the visreg package accurately portrays a visualization of height's effect on Rt as part of a linear mixed model (see my comment above from 2 days ago - it is a negative slope).
However, visreg doesn't allow me to combine visuals like this from models with differing variables. Instead, I turned to ggplot, which gave me subplot a from my comment two days ago above. Notice here, though, how height for 1999 (blue) is a positive trend. I believe this is because I'm doing lm() for height to Rt (also when I do height to predict(Rt) based on the mixed effects model output.

Thus, the goal is to either have the visreg package working or force ggplot to work the way I want it to. I'm in contact with the visreg author, so maybe we'll see results soon. Otherwise, I looked at geom_ribbon and it looks like I define the bounds around the line, but I'm looking for more an accurate visual of the effect of height. I don't believe it's possible in ggplot.

Hopefully this makes sense. I will be getting back to looking at the code tonight.

teixeirak commented 4 years ago

@rudeboybert , @mcgregorian1 , we might actually not need to portray the categorical variable here, Canopy position, depending on what we decide about issue #106.

mcgregorian1 commented 4 years ago

I worked with visreg author here and he has fixed it! This now produces the correct plot for 1999. I will update this later:

rudeboybert commented 4 years ago

Great!

mcgregorian1 commented 4 years ago

This isn't perfect yet but I wanted to show the actual visual now. This is what ggplot is not able to do.

teixeirak commented 4 years ago

Thanks for solving this! This will be a great plot.

A few questions/comments (probably some things you're planning to do anyway):

there's no way to put confidence intervals on here, is there?
x-axis label on e (I think) needs to be changed to TLP
make just one copy of the legend, and have that after panel e
order CP as S-I-C-D (just rename them with 1-2-3-4 in front if needed)

(We could do some cosmetic cleanup manually if needed.)

mcgregorian1 commented 4 years ago

there's no way to put confidence intervals on here, is there?

Not that I know of. The visreg vignette specifically mentions this for mixed effects models: "visreg can be used with mixed models, for example from the nlme or lme4 packages, although it is worth noting that these packages are unable to incorporate uncertainty about random effects into predictions, and therefore do not offer confidence intervals, meaning that visreg plots will lack confidence bands." In other words, and as seen on this site, we would only have confidence bands for if we had final models that didn't include a random effect, but of course all of our models have random.

x-axis label on e (I think) needs to be changed to TLP

Oops haha I'll change that.

make just one copy of the legend, and have that after panel e

I tried doing that last night before posting but it wasn't working as none of the individual plots have all four lines. I might just need to do the separate png like I was doing before.

order CP as S-I-C-D (just rename them with 1-2-3-4 in front if needed)

Can do.

teixeirak commented 4 years ago

Not that I know of. The visreg vignette specifically mentions this for mixed effects models: "visreg can be used with mixed models, for example from the nlme or lme4 packages, although it is worth noting that these packages are unable to incorporate uncertainty about random effects into predictions, and therefore do not offer confidence intervals, meaning that visreg plots will lack confidence bands." In other words, and as seen on this site, we would only have confidence bands for if we had final models that didn't include a random effect, but of course all of our models have random.

Too bad, but it is what it is.

teixeirak commented 4 years ago

Note that we will also need to change the 1999 model, as it included both H and CP (now includes just CP).

mcgregorian1 commented 4 years ago

Taking out crown position and taking out the individual years, our figure looks like this:

mcgregorian1 commented 4 years ago

From @ValentineHerr 's find in #113, the plot looks like this (albeit the letter labels need to be moved). @teixeirak what do we think?

teixeirak commented 4 years ago

That could work... I'm a little concerned about the huge variability making this come across as a poor model to people who aren't aware how much inherent variability there is in tree-ring data.

Another thought-- we might return to showing the individual years, which would parallel current Figs 1 and 2.

teixeirak commented 4 years ago

Whatever you end up doing, one thing that would help is to highlight the Rt=1.0 line.

teixeirak commented 4 years ago

Another suggestion may be to color the points by drought year, just so the linkage to Figs. 1-2 is clear.

teixeirak commented 4 years ago

@mcgregorian1 , I do think a version of this could be nice.

I have a bit of trouble at this point saying whether I'd prefer it with or without the individual years.

mcgregorian1 commented 4 years ago

Does this work? This is plotting with visreg for only the full data, with the points colored by year

teixeirak commented 4 years ago

Thanks, that does look better. I'd have a few minor comments if we decide to go with this basic version, but I'm still up in the air if we should show the data points. I think that some work on the writing, which I'll do now, should help clarify that.

teixeirak commented 4 years ago

From Alan:

" I think I like the second option of Figure 4 better. The second option emphasizes the trends, and the first option emphasizes the variation around the trends. If CIs can be added to the second option, it could illustrate that there is a lot of variation around the trends without the distraction of plotting each of the data points. "

From Neil:

" That is really tough. The geoscientists always used to laugh at our correlations. I like the data points figure, but can see people saying the noise is an issue.

Why not the data points in supplemental and then the bottom figure with confidence intervals in the main paper? I like the impact of individual years more because that might be the grist of ecology (well, partly, but this is my bias). ..."

mcgregorian1 commented 4 years ago

I saw Alan's comment when he suggested it - and I agree, seeing both figures side by side, I'd be more inclined to see the individual years in a paper as it would make it more understandable. However, that does potentially mean the discussion has to be changed to accommodate that

On Fri, Jul 17, 2020 at 4:15 PM Kristina Anderson-Teixeira < notifications@github.com> wrote:

From Alan:

" I think I like the second option of Figure 4 better. The second option emphasizes the trends, and the first option emphasizes the variation around the trends. If CIs can be added to the second option, it could illustrate that there is a lot of variation around the trends without the distraction of plotting each of the data points. "

From Neil:

" That is really tough. The geoscientists always used to laugh at our correlations. I like the data points figure, but can see people saying the noise is an issue.

Why not the data points in supplemental and then the bottom figure with confidence intervals in the main paper? I like the impact of individual years more because that might be the grist of ecology (well, partly, but this is my bias). ..."

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SCBI-ForestGEO/McGregor_climate-sensitivity-variation/issues/71#issuecomment-660317665, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJNRBENXGZXAURXJ2MAJH7TR4CWNPANCNFSM4KN6SW7Q .

--

Ian McGregor

Ph.D. Student | Center for Geospatial Analytics

He/Him/His

College of Natural Resources

Jordan Hall 4120 | Campus Box 7106

North Carolina State University

2800 Faucette Dr.

Raleigh, NC 27695 USA imcgreg@ncsu.edu | 714-864-1005 | geospatial.ncsu.edu

teixeirak commented 4 years ago

Yes, so far 3 of 3 coauthors prefer the second option, and I'm leaning that way as well. It may be just a matter of inertia, but on my most recent pass of the results, I felt that taking out the year-by-year results would make it too thin. I do think we should adjust the emphasis, though.

mcgregorian1 commented 4 years ago

How's this?

teixeirak commented 4 years ago

Great! I’ll suggest a few small cosmetic changes later, but overall it’s what we want.

teixeirak commented 4 years ago

Cleanup of this figure:

[x] make just one legend
[ ] if easy, would be nice to have H and TWI on linear axes (but it is okay as is)
[x] use the accepted symbols for PLA and TLP
[x] increase axis font size a bit
[x] adjust position of letters

teixeirak commented 4 years ago

I have saved this as "tables_figures/publication/Figure4_model_vis.png". Please save the updated version there.

SCBI-ForestGEO / McGregor_climate-sensitivity-variation

consider visualizing regression model results #71