Open katiewampler opened 2 months ago
Not sure if this is related or a separate issue, but regardless of the degrees of freedom, the chi-squared values I get from the SSN2 package are very different from those I get from the betareg package for beta family models, it seems like they return very different logLik values from the optim function. This results in very different p-values. If I test a model with one variable removed, the p-value from the logLikelihood test from the betareg package matches the p-value for the removed coefficient value pretty closely, but the logLikehood test from the SSN2 package is very different.
Example below: lsn.ssn.zip
library(SSN2) library(sf) library(betareg) library(dplyr)
ssn <- ssn_import("~/1_Research/5_McKenzie_EEMs/data/parafac_ssn/lsn.ssn") #load SSN df<- ssn_get_data(ssn) df[,6:11] <- sapply(df[,6:11] %>%st_drop_geometry(), function(x){x/100}) #convert to a fraction ssn <- ssn_put_data(df, ssn) #put converted data in model
mod1a <- ssn_glm(Comp_1 ~ DNBR * ARIDITY, ssn, family="beta", tailup_type = "none", taildown_type = "none", euclid_type = "none", estmethod="ml") mod2a <- ssn_glm(Comp_1 ~ DNBR + ARIDITY, ssn, family="beta", tailup_type = "none", taildown_type = "none", euclid_type = "none", estmethod = "ml") summary(mod1a)
SSN2:::anova.ssn_glm(mod1a, mod2a)
mod1b <- betareg(Comp_1 ~ DNBR * ARIDITY, df) mod2b <- betareg(Comp_1 ~ DNBR + ARIDITY, df)
summary(mod1b) lmtest::lrtest(mod1b, mod2b) #anova likelihood ratio test
Thanks @katiewampler ; I'll answer both your questions below.
There are two things going on here regarding the anova likelihood ratio test (LRT) for nested models. First, it is important to clarify whether you are talking about a model fit using REML (estmethod = "reml"
) or ML (estmethod = "ml"
). If REML, the difference in degrees of freedom is the difference between the number of autocorrelation parameters between the two models (as the fixed effect structure for two REML models must be the same). If ML, the difference in degrees of freedom is the difference between the number of autocorrelation parameters plus the difference between the number of fixed effect parameters (the fixed effect structure for two ML models does not have to be the same). So the code works properly for REML but you have identified a bug with ML. More about technical details for AIC/AICc/BIC/anova can be found at this link (note that spmodel
and SSN2
use the same fitting engine; we are working on a technical details document for SSN2 which will eventually be placed on the SSN2 website).
The ssn_glm()
models use a slightly different likelihood than those from betareg
. Also, as specified in your code, thessn_glm()
models also have allow for a nugget effect (see nugget_type
), which betareg
models do not. The nugget effect was estimated to be pretty small, and as such, parameter estimates and p-values from ssn_glm()
and betareg
are very similar (though the likelihoods are different). Generally we advise using REML, though we recognize a benefit of ML is using likelihood-based statistics to compare models with varying fixed effect structures. AIC/AICc/BIC/anova LRT models for REML require the same fixed effect structure, while deviance()
and loocv()
fit statistics can be used no matter the estimation method or fixed effect structure. As an aside, we are working identifying some stability updates for the ML estimation likelihood when all autocorrelation variance parameters approach zero.
If a goal is study the impact of the spatial stream network structure on model fit, we recommend comparing models all fit using ssn_glm()
because they use the same likelihood. If proceeding with a single, final model is a goal, we recommend that model is eventually fit using REML.
I will edit this message and close the issue when the bug for Question 1 is fixed.
Not a stats expert, so perhaps I'm just not understanding. But in the anova function for both glm and lm, whenever I compare two models, no matter what the formulas are, it always reports the difference in degree of freedom as 0, resulting an p-value of 0.
It looks like the code is pulling the degrees of freedom from object$npar (which looks like it's related to the number of autocorrelation stuctures used and doesn't seem to change when the autocorrelation structure is constant), whereas p seems to be the number of parameters used in the formula, which to my understanding is where the change in degrees of freedom should come from.