Open StaffanBetner opened 5 years ago
It's unlikely; it wouldn't be impossible to implement, but there are lots of higher priorities. Can you comment on what functionality you want that is not covered by the ordinal package on CRAN ?
The ordinal package is a bit of a mess, unfortunately. The old implementation (cl(m)m2) only support one random intercept, and the new one doesn't have any predict method (which makes it incompatible with DHARMa) or scale/nominal effects. It also does not support REML estimation.
Fair enough. On the other hand:
Just to second the request for glmmTMB
to add ordinal regression to its supported families. I'm finding the ordinal
package can't work with large datasets in the way glmmTMB
can.
I'd also like to second this request. Fitting ordinal regression models with scale effects and more than one random factor would be a big plus. I'm not aware of any R package that would be capable of this within a frequentist modelling framework.
Can I revive this old issue? Given how fast and easy it seemed to implement ordered beta regression, maybe the underlying TMB package has has developed so well that it is now easier to implement ordinal or multinomial regression?
Ordinal regression structures in glmmTMB would welcomed by ecologists and others.
Chiming back in:
glmmTMB
would be useful, although:
clmm
does already support multiple REsordinal
does poorly on large data setsThanks for taking another look, @bbolker! I understand that this is quite demanding but I just wanted to make sure to explain what I meant by "scale effects":
Those are basically effects on the residual variance (or rather on the SD), e.g. to allow the residual variance to differ between experimental conditions. ordinal::clm
(fixed effects only) and ordinal::clmm2
(mixed effects but only a single random factor) implement this but the most advanced ordinal::clmm
does not.
I had a particular model in mind with an ordinal response variable (confidence), two crossed random factors, and residual variance that systematically differs between two levels of a fixed factor. That is neither suitable for ordinal::clmm
because that function has no "scale effects", nor for ordinal::clmm2
or ordinal::clm
because the model has two random factors.
I would like to second this. We are specifically looking for a frequentist implementation in R of the following model:
This is currently not possible with ordinal::clmm2
, which allows the category-specific effects but not nested random effect design, nor with ordinal::clmm
, which allows the nested random effect design but not the category-specific effects. I have coded the model in Stan, and there is also support for it in brms. However there is no frequentist implementation in R that I can find. I have the same problem as @mmrabe where clmm2
has support for esoteric versions of ordinal multinomial models (scale effects and nominal effects) but clmm
has support for more complex random effect structures.
I would be interested in trying to learn how to implement my own family in glmmTMB to do this, but I do not have the experience in C++ coding to do that. Any input would be greatly appreciated.
I haven't thought about this in a little while, but I think that this can be nearly entirely implemented on the R side, by constructing specialized $\mathbf X$ and $\mathbf Z$ matrices (and maybe creating pseudo-observations) ... in which case very little C++ coding (maybe none?) Internally, I think the model would look like a binomial where the logit link would take care of the baseline proportional odds assumption. If I'm right, the hard part would be dealing with all of the implicit general assumptions we're making that would be broken by this new family ...
What does a minimal implementation look like in Stan?
Thanks for thinking about it! Here is an implementation of the model discussed in Chapter 14 of Stroup et al.'s GLMM book in Stan. I have provided Stan code for the proportional odds model and the model where proportional odds assumption is relaxed (i.e. there are category-specific, AKA nominal, effects). Both models have nested random effects.
https://github.com/qdread/stan_multinomial_poinsettia
The repo has the two Stan scripts, example dataset, and R script to fit the Stan models. The example dataset is supposed to represent poinsettias of 3 different varieties grown by 12 different growers that are rated on an ascending scale C, B, A. Variety is the treatment and grower is the block.
Embarrassingly, I don't have a copy of Stroup's book (I need to get one). Your Stan code is a good start but for glmmTMB we need to figure out how to implement everything so that it (1) generalizes nicely to any number of levels; (2) works mostly by linear algebra/matrix computation rather than indexing parameter vectors ... I may be able to start from your code and get to something standalone based on matrix computation, then think about how to incorporate it into glmmTMB without breaking anything ...
Thanks for taking a look! I will try to work on the Stan implementation to turn the loops + indexing into a proper matrix computation, and update you if/when I have something working.
Hi again, I have updated the Stan code in the repo. I've turned the fixed and random effects into design matrices. Now the linear predictors are made by summing up the products of matrix multiplications. So that part is now based on matrix computation and is generalizable to however many levels. I also rewrote the inverse link part so that it is generalizable to >3 levels, though there may be a better way to do that part. Let me know if there is any other way I can help.
FWIW, from the ordinal package vignette:
The restriction of the threshold parameters {θj } being non-decreasing is dealt with by defining ℓ(θ, β; y) = ∞ when {θj } are not in a non-decreasing sequence. If the algorithm attempts evaluation at such illegal values step-halving effectively brings the algorithm back on track. Other implementations of CLMs re-parameterize {θj } such that the non-decreasing nature of {θj } is enforced by the parameterization, for example,
MASS::polr
(package version 7.3.49) optimize[s] the likelihood using
$$ \tilde θ_1 = θ_1, \tilde θ_2 = \exp(θ_2 − θ1), . . . , \tilde θ{J−1} = \exp(θ{J−2} − θ{J−1}) $$
This is deliberately not used in ordinal because the log-likelihood function is generally closer to quadratic in the original parameterization in our experience which facilitates faster convergence.
Some thoughts about implementation.
ordinal
does (above) and simply return NA
/NaN
when constraint is violated?MASS::polr()
does and take the cumulative sum of exp(xi)
(where xi
is another, unconstrained parameter vector)?lower
) in nlminb
to constrain parameters to be positive?psicum = cumsum(psi) ## or c(cumsum(psi), Inf) ?
prob[i] <- {
if (response[i] == maxresponse) {
1 - linkinv(psicum[maxresponse]
} else {
diff(linkinv(psicum(response[c(i, i-1)]))
}
}
log(prob[i])
ordinal::clm
or MASS::polr
diff()
operator with something that directly computed the log probability from the differences on the link scale ... (diff(response)*dmu_deta
for extreme values?)I emailed the author of ordinal
months ago but received no response. Its GitHub does not seem to address issues. I have reprogrammed 23 hidden functions beneath ordinal::clm()
locally for my special application (which does not include random effects). Let me know if one needs help understand the undocumented functions. Comments to @bbolker 's thoughts:
ordinal::clm(scale = ~)
and supposedly glmmTMB(dispformula = ~)
. Nominal effects: another way to say that coefficients vary by response level useful for partial and nonproportional odds, for $j$ response levels, there will be $j - 1$ estimates for each predictor with nominal effects. This is somewhat like a multi-equation model situation. These terms of scale and nominal effects are used in ordinal
but may be rare to see elsewhere. ordinal
, θ (thresholds and nominal effects, referred to as clm()$alpha
), β (location coefficients, referred to as clm()$beta
), and scale coefficients (referred to as clm()$zeta
) are kept in one vector sequentially, as shown by in rho$par
for all parameter estimates in hidden functions clm.newRho()
and get_clmRho.default()
. In rho
, n.psi
is the number of thresholds, nominal coefficients, and location coefficients, k
is the number of scale coefficients, so length(rho$par) == n.psi + k
is the total degree of freedom. Note that one predictor (one column in data
) can correspond to one or more parameter estimates (e.g. one scale coefficient and 4 nominal coefficients in 5-point satisfaction responses). B1, B2
with special offset o1, o2
to give roughly -Inf
and +Inf
(+/-1e5
actually) for y == 1
and y == maxy
, respectively, in clm.newRho()
).(drop(B1 %*% par[1:n.psi]) + o1)/sigma
in clm.nll()
where sigma
contains the scale effects in clm(scale = ~)
(which should correspond to disp = ~
in glmmTMB). nominal = ~
subformula, its location coefficient will be aliased (if the predictor is also included in the main formula, summary()$coefficients
show NA) because both cannot be identified together. ordinal
also runs a test function to assert if threshold + nominal are increasing with response levels across all observations and otherwise gives a warning. Does the situation favor the threshold estimates to be stored in an "extra family specific parameters" vector? ordinal
starts with default threshold estimates that make all response categories equally probably by qlogis((1:ntheta)/(ntheta + 1))
in startthreshold()
. All other parameters (nominal coefficients, location coefficients, scale coefficients) start from 0. The optimizer such as clm.nll()
has no constraints except if (all(is.finite(rho$fitted)) && all(rho$fitted > 0)) -sum(rho$wts * log(rho$fitted)) else Inf
to give the loglikelihood function, where rho$fitted > 0
means that fitted probabilities of observed categories must be positive, equivalently dropping cases where thresholds do not monotonically increase with the response level. ordinal
does so. For example, in clm.newRho()
, the line B1 <- cbind(B1, X[keep, -1, drop = FALSE])
drops the intercept (the first column) in the location design matrix (X
). This 0 intercept but free thresholds seem the most common and well understood choice to ensure identifiability. The developer, however, can add routines that when all thresholds are known and supplied by the user (e.g. income brackets), returned estimates are the scale parameter and the intercept that can be inversely inferred from estimated thresholds. Here, only two parameters needed for a null model, with no latent coefficients any more after all location coefficients properly rescaled from the inferred scale parameter. However, estimated thresholds --> implied scale parameter --> rescaled location coefficients are nonlinear transformation. This step can be done either (1) prior to model fitting by reparameterization, so returned coefficients are directly scale parameter (like sigma in lm()
), the intercept, and rescaled location coefficients or (2) post fit via perhaps delta method. ordinal
features, I think it is beneficial to include other variants for ordinal responses, which only differs by how the linear predictor is translated to the likelihood function or fitted probability mass but otherwise are very similar in programming. Namely, (1) the continuing-ratio regression (2) stopping-ratio regression (3) the adjacent-category regression relates, (4) multinomial models, (5) stereotype models restrict coefficients in multinomial models with proportionality, plus (6) cumulative link models. The benefit is that except multinomial models (as in packages mlogit
and gmnl
), I have found no packages that fit these models (e.g. VGAM
) to include random effects. Also, VGAM
is very slow. Fullerton (2009) expressed the connection and difference among these variants. Yee (2010) gave simple summary of that quantity the linear predictor connects to. I guess fitting these variant models require only minor variation in calculating the fitted probability mass (at least in the logit case), but I am not sure whether the programming can be easily generalized to other link functions. I know that multinomial logit and probit models are very different things, the latter requiring numerical integration and random draws. dispformula = ~
and considers the two-level case, both binary and ordinal situations can be addressed all together. Some useful references.
It would be very useful if glmmTMB supported different types of ordinal regression, e.g. the proportional odds model (cumulative link logit model).