teixeirak commented 4 years ago

From @ValentineHerr:

Here are my main comments:

[x] the use of "best model", "best models", "models within dAICc=2" etc... seems to be inconstistant accross the manuscript. You should stick with "the best model" when you are talking about the model with lowest AICc and best models fot the set of models that are within 2 AICc points of the best model (and never refer again that they are within 2 points dAICc as it is what defines the). I think that would improve the flow of the paper.
[x] You switched to GLMM (from LMM), but you need to say what familly and link function you used (I am curious to which one you used as it is not that straigforward to choose!).
[x] I know one reviewer suggested to put the results table is SI but you are refering to S6 12 times in the paper, I think it belongs the main paper. Your call though.
[x] I still think that the univariates analysis are inappropriate. I think the best approach is to include the 1 to 2 trait variables (exactly like you said in line 234) to your models with the other variables and see if that lowers the AICc or not. It would essentially be adding a couple columns to table S6 for those traits you did not consider at that stage. The reasoning begind this is, if you don't include variables you aready know are important (like Height), you are asking WD, LMA etc... to explain all the variability that exists, by themselves. It is likely that they will fail, but there is a chance they would explain some residual variability after you accounted for Height. (Doing so would remove table S4 and S5)
[x] Crown position vs Height, I think they should never be in the same model as they are obviously collinear. (issue #106)

Other minor comments, line by line:

[x] l. 203: delete "both"
[x] l. 237-244: What are those traits? (I think it would be good to list them here, in addition to refering to the table) "corresponding null model" ?? isn't it THE null model? If it is not (if you included other varaibels like Height etc...) then this univariate analysis needs to go and you should move to do something like what I explain in my 4th bullet point above
[x] l.270-274: this paragraph is the one where "best model, models within dAIC = 2" .... is particularly confusing
[x] l.275 vs l. 280, stick with either "canopy" or "crown" position
[x] l.281: "when included in models with DAICc..." is confusing, it sounds like you added crown position to your best models.
[x] l.283-l.288: wouldn't this be better as one of the 1st results paragraph or even in the methods?
[x] l.292: How can you test for "negative" interaction. At best I think you can test for an "interaction" and then look at the sign of it.
[x] l.295: remove "the"
[x] l. 299-l. 301: as you probably guess, I am not a fan of this strategy
[x] table 2: I am confused with all the "all" in column "n*"
[x] Also, coming back to GLMM. There is no R2 in GLMM, but if I remember well there is a marginal and conditional R2. You should specify which one you are using. Maybe you did and I missed it. sorry if that is the case.

teixeirak commented 4 years ago

First item was issue #103. Now fixed (unless we missed something), although with slightly different terminology.

teixeirak commented 4 years ago

* I still think that the univariates analysis are inappropriate. I think the best approach is to include the 1 to 2 trait variables (exactly like you said in line 234) to your models with the other variables and see if that lowers the AICc or not. It would essentially be adding a couple columns to table S6 for those traits you did not consider at that stage. The reasoning begind this is, if you don't include variables you aready know are important (like Height), you are asking WD, LMA etc... to explain all the variability that exists, by themselves. It is likely that they will fail, but there is a chance they would explain some residual variability after you accounted for Height. (Doing so would remove table S4 and S5)

@ValentineHerr , I think we did what you're suggesting. For the tests in tables S4-S5, the null model included height, TWI, and CP (of course, that's related to your last comment). I'll clarify that in the text.

mcgregorian1 commented 4 years ago

[ ] You switched to GLMM (from LMM), but you need to say what family and link function you used (I am curious to which one you used as it is not that straightforward to choose!).

@ValentineHerr can I get your opinion?

I've been running glmer() as a family=Gaussian, because the data is visually structured in that way (see below).

However, I tried the Shapiro wilcox test, which gave me results

Shapiro-Wilk normality test

data:  trees_all_sub$resist.value
W = 0.97448, p-value < 2.2e-16

indicating strongly that my data is not normally distributed. Based on the families available for glmer, the only other thing that makes sense to me is Gamma. However, while my independent variable (resist.value) is all positive here, it can indeed be negative. Does that alone invalidate Gamma? Does that mean, if Gaussian is my only option left, that it's ok to use even though the Shapiro test says absolutely not?

According to the qqplot, I would've sworn my data was Gaussian...

Follow-up question - this qqplot above is on all the data, but I do model runs on subsets for only 1966, 1977, and 1999. Does that mean I have to make sure each subset is normal as well?

ValentineHerr commented 4 years ago

Weren't you log transforming your response variable before and that is why the reviewer told you to go with GLM? What was your motivation for log transforming in the first place? If you didn't have strong motivation to do that (your response variable is continuous and can be negative or positive AND the residuals of your LMM look normally distributed) then you can just stick with LMM whithout transforming your response. GLMM with gaussian distribution is essentially the same as a LMM, I believe, or I don't know the main differencess off hand.

The assumption of normality is always normality of the residuals. not the raw response variable.

mcgregorian1 commented 4 years ago

Ohh ok. No, we were not log-transforming the response variable itself, only height and TWI (as independent variables).

This was the comment from R1 about GLMMs, and this is why we decided to implement.

ValentineHerr commented 4 years ago

Then the reviewer is wrong to think that using GLM would avoid transforming the independent variables. Go back to using lmer.

mcgregorian1 commented 4 years ago

@teixeirak I will try running everything again with lmer and add the suffix of _lmer to the end of the new tables. I'm not sure when I'll be able to do this today but it will be done by tonight.

@ValentineHerr thank you!

ValentineHerr commented 4 years ago

remember to put back REML = TRUE and FALSE, when appropriate!

mcgregorian1 commented 4 years ago

remember to put back REML = TRUE and FALSE, when appropriate!

@ValentineHerr to make sure I'm doing this correctly, can you review this please?

In the analysis, we run models twice:

once for each variable, where we have one defined null model, and one defined full model with the variable added.
once for comparing all possible combinations of the "top" variables from the first model runs.

I think the way I understand it,

I do REML=FALSE for the single variable tests.
I do REML = FALSE for the comparing of different combinations
- I then take the best models from these tests, do REML = TRUE, and present these model fits and coefficients in the paper.

Is this correct?

ValentineHerr commented 4 years ago

I think the way I understand it,

I do REML=FALSE for the single variable tests.

I do REML = FALSE for the comparing of different combinations

I then take the best models from these tests, do REML = TRUE, and present these model fits and coefficients in the paper.

Is this correct?

yes, I think so. In general: REML = FALSE when you are comparing models with different fixed effects, REML = TRUE when you want to report the coefficients in the paper (after you are done with model selection).

one defined null model

I think this is what throws me off. Don't refer the "NULL" model for this part of the analysis as it sounds like you are compareing Rt ~ 1 to Rt ~ trait, while, if I now understand well, your "null" model is more something like Rt ~ height... ... If that is the case, say so:

[ ] in the text, replace "corresponding null model" to "model with height only" (or whatever)
[ ] and maybe make sure that the "null" model you are comparing those traits to ( Rt ~ height, or whatever) is mentionned somewhere in table S4 and S5, e.g. in the legend, so that there is no chance that a reviewer think that you did Rt ~ 1 vs Rt ~ trait

mcgregorian1 commented 4 years ago

Perfect, thanks @ValentineHerr !

teixeirak commented 4 years ago

* l. 299-l. 301: as you probably guess, I am not a fan of this strategy

@ValentineHerr , I think this okay (see here: https://github.com/SCBI-ForestGEO/McGregor_climate-sensitivity-variation/issues/105#issuecomment-658237532).

ValentineHerr commented 4 years ago

* l. 299-l. 301: as you probably guess, I am not a fan of this strategy
@ValentineHerr , I think this okay (see here: #105 (comment)).

yes

teixeirak commented 4 years ago

* I know one reviewer suggested to put the results table is SI but you are referring to S6 12 times in the paper, I think it belongs the main paper. Your call though.

I prefer to leave it in the SI. It is critical to our hypothesis testing, but the info presented there will probably be fairly uninteresting to most readers.

teixeirak commented 4 years ago

SCBI-ForestGEO / McGregor_climate-sensitivity-variation

integrate Valentine's comments #105

110