Closed teixeirak closed 4 years ago
First item was issue #103. Now fixed (unless we missed something), although with slightly different terminology.
* I still think that the univariates analysis are inappropriate. I think the best approach is to include the 1 to 2 trait variables (exactly like you said in line 234) to your models with the other variables and see if that lowers the AICc or not. It would essentially be adding a couple columns to table S6 for those traits you did not consider at that stage. The reasoning begind this is, if you don't include variables you aready know are important (like Height), you are asking WD, LMA etc... to explain all the variability that exists, by themselves. It is likely that they will fail, but there is a chance they would explain some residual variability after you accounted for Height. (Doing so would remove table S4 and S5)
@ValentineHerr , I think we did what you're suggesting. For the tests in tables S4-S5, the null model included height, TWI, and CP (of course, that's related to your last comment). I'll clarify that in the text.
- [ ] You switched to GLMM (from LMM), but you need to say what family and link function you used (I am curious to which one you used as it is not that straightforward to choose!).
@ValentineHerr can I get your opinion?
I've been running glmer()
as a family=Gaussian, because the data is visually structured in that way (see below).
However, I tried the Shapiro wilcox test, which gave me results
Shapiro-Wilk normality test
data: trees_all_sub$resist.value
W = 0.97448, p-value < 2.2e-16
indicating strongly that my data is not normally distributed. Based on the families available for glmer
, the only other thing that makes sense to me is Gamma. However, while my independent variable (resist.value
) is all positive here, it can indeed be negative. Does that alone invalidate Gamma? Does that mean, if Gaussian is my only option left, that it's ok to use even though the Shapiro test says absolutely not?
Follow-up question - this qqplot above is on all the data, but I do model runs on subsets for only 1966, 1977, and 1999. Does that mean I have to make sure each subset is normal as well?
Weren't you log transforming your response variable before and that is why the reviewer told you to go with GLM? What was your motivation for log transforming in the first place? If you didn't have strong motivation to do that (your response variable is continuous and can be negative or positive AND the residuals of your LMM look normally distributed) then you can just stick with LMM whithout transforming your response. GLMM with gaussian distribution is essentially the same as a LMM, I believe, or I don't know the main differencess off hand.
The assumption of normality is always normality of the residuals. not the raw response variable.
Ohh ok. No, we were not log-transforming the response variable itself, only height and TWI (as independent variables).
This was the comment from R1 about GLMMs, and this is why we decided to implement.
Then the reviewer is wrong to think that using GLM would avoid transforming the independent variables. Go back to using lmer.
@teixeirak I will try running everything again with lmer and add the suffix of _lmer
to the end of the new tables. I'm not sure when I'll be able to do this today but it will be done by tonight.
@ValentineHerr thank you!
remember to put back REML = TRUE and FALSE, when appropriate!
remember to put back REML = TRUE and FALSE, when appropriate!
@ValentineHerr to make sure I'm doing this correctly, can you review this please?
In the analysis, we run models twice:
I think the way I understand it,
Is this correct?
I think the way I understand it,
- I do REML=FALSE for the single variable tests.
I do REML = FALSE for the comparing of different combinations
- I then take the best models from these tests, do REML = TRUE, and present these model fits and coefficients in the paper.
Is this correct?
yes, I think so. In general: REML = FALSE when you are comparing models with different fixed effects, REML = TRUE when you want to report the coefficients in the paper (after you are done with model selection).
- one defined null model
I think this is what throws me off. Don't refer the "NULL" model for this part of the analysis as it sounds like you are compareing Rt ~ 1
to Rt ~ trait
, while, if I now understand well, your "null" model is more something like Rt ~ height
... ... If that is the case, say so:
Rt ~ height
, or whatever) is mentionned somewhere in table S4 and S5, e.g. in the legend, so that there is no chance that a reviewer think that you did Rt ~ 1
vs Rt ~ trait
Perfect, thanks @ValentineHerr !
* l. 299-l. 301: as you probably guess, I am not a fan of this strategy
@ValentineHerr , I think this okay (see here: https://github.com/SCBI-ForestGEO/McGregor_climate-sensitivity-variation/issues/105#issuecomment-658237532).
* l. 299-l. 301: as you probably guess, I am not a fan of this strategy
@ValentineHerr , I think this okay (see here: #105 (comment)).
yes
* I know one reviewer suggested to put the results table is SI but you are referring to S6 12 times in the paper, I think it belongs the main paper. Your call though.
I prefer to leave it in the SI. It is critical to our hypothesis testing, but the info presented there will probably be fairly uninteresting to most readers.
From @ValentineHerr:
Here are my main comments:
Other minor comments, line by line: