Open iago-pssjd opened 2 years ago
Any example for multivariate models?
Hi @strengejacke.
I assume you ask for the multivariate response models. I cite a paragraph in the same page of the book I mentioned above
Some care is needed to distinguish between multiple responses and multivariate responses. Multiple responses are treated independently of each other, and are to be inputted side-by-side on the LHS of the formula using cbind().Eachresponse may be a vector, of dimension Q1. Multivariate responses correspond to Q1 > 1, and are handled by some full-likelihood model that takes into account their joint distribution.
The example given by the author is the family binormal
, so
set.seed(123); nn <- 1000
bdata <- data.frame(x2 = runif(nn), x3 = runif(nn))
bdata <- transform(bdata, y1 = rnorm(nn, 1 + 2 * x2),
y2 = rnorm(nn, 3 + 4 * x2))
fit1 <- vglm(cbind(y1, y2) ~ x2,
binormal(), data = bdata, trace = FALSE)
coef(fit1, matrix = TRUE)
mean1 mean2 loglink(sd1) loglink(sd2) rhobitlink(rho)
(Intercept) 1.021178 2.929393 0.009053151 -0.02282574 0.05231844
x2 2.042807 4.101541 0.000000000 0.00000000 0.00000000
In this case sigma
would be the exp
of the diagonal matrix 0.009053151, -0.02282574
. (Further, we would have an intercept-vector 1.021178, 2.929393
and an x2
-slope vector 2.042807, 4.101541
). Other examples would be with the families trinormal
or bistudentt
(as shown in the help pages of these functions). More implamented bivariate distributions are given in table 13.1 (page 382) of the book.
Then, first to get the parameters and other coefficients of VGAM
models, and second, to distinguish between multiple responses and multivariate responses, one should look at the vglm-object@family
. Therefore, the way to get them, and in particular sigma
, is not so immediate as
https://github.com/easystats/insight/blob/a81c92c59dac4b2c6a212b225a8295ebda8af354/R/get_sigma.R#L96-L101
Let's look some other examples.
?bistudentt
)nn <- 1000
mydof <- logloglink(1, inverse = TRUE)
ymat <- cbind(rt(nn, df = mydof), rt(nn, df = mydof))
bdata <- data.frame(y1 = ymat[, 1], y2 = ymat[, 2],
y3 = ymat[, 1], y4 = ymat[, 2], x2 = runif(nn))
## Not run: plot(ymat, col = "blue")
fit1 <- vglm(cbind(y1, y2, y3, y4) ~ 1, # 2 responses, e.g., (y1,y2) is the 1st
fam = bistudentt, # crit = "coef", # Sometimes a good idea
data = bdata, trace = FALSE)
coef(fit1, matrix = TRUE)
logloglink(df1) rhobitlink(rho1) logloglink(df2) rhobitlink(rho2)
(Intercept) 1.072883 -0.01766264 1.072882 -0.01765367
In this case, as we had two responses, we would have two sigma
matrices, both having 1
in all the elements of the main diagonal, and having as secondary diagonal the values rho1
(maybe = rhobitlink(-0.01766264, inverse = TRUE)
?) and rho2
(maybe = rhobitlink(-0.01765367, inverse = TRUE)
?) respectively.
Multiple responses (got from ?tobit
)
tdata <- data.frame(x2 = seq(-1, 1, length = (nn <- 100)))
set.seed(1)
Lower <- 1; Upper <- 4 # For the nonstandard Tobit model
tdata <- transform(tdata,
Lower.vec = rnorm(nn, Lower, 0.5),
Upper.vec = rnorm(nn, Upper, 0.5))
meanfun1 <- function(x) 0 + 2*x
meanfun2 <- function(x) 2 + 2*x
meanfun3 <- function(x) 2 + 2*x
meanfun4 <- function(x) 3 + 2*x
tdata <- transform(tdata,
y1 = rtobit(nn, mean = meanfun1(x2)), # Standard Tobit model
y2 = rtobit(nn, mean = meanfun2(x2), Lower = Lower, Upper = Upper),
y3 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec),
y4 = rtobit(nn, mean = meanfun3(x2), Lower = Lower.vec, Upper = Upper.vec))
fit4 <- vglm(cbind(y3, y4) ~ x2,
tobit(Lower = rep(with(tdata, Lower.vec), each = 2),
Upper = rep(with(tdata, Upper.vec), each = 2),
byrow.arg = TRUE),
data = tdata, crit = "coeff", trace = FALSE)
coef(fit4, matrix = TRUE)
mu1 loglink(sd1) mu2 loglink(sd2)
(Intercept) 1.950757 0.04030946 1.944842 0.008811054
x2 1.899930 0.00000000 2.122347 0.000000000
In this case, we would have two sigma values, one for each response, being coef(fit4, matrix = TRUE)["(Intercept)","loglink(sd1)"]
(or also coef(fit4)["(Intercept):2"]
) and coef(fit4, matrix = TRUE)["(Intercept)","loglink(sd2)"]
(or also coef(fit4)["(Intercept):4"]
)
A more trivial example (from ?vglm
): same as for glm
d.AD <- data.frame(treatment = gl(3, 3),
outcome = gl(3, 1, 9),
counts = c(18,17,15,20,10,20,25,13,12))
glm.D93 <- glm(counts ~ outcome + treatment, family = poisson,
data = d.AD)
vglm.D93 <- vglm(counts ~ outcome + treatment, family = poissonff,
data = d.AD, trace = FALSE)
sigma(glm.D93)
[1] 1.13238
sigma(vglm.D93)
[1] 1.13238
Finally, other arguments given to family
may be relevant, like zero
, nointercept
and other (table 18.6, page 518 in the book), even this may appear summarized in some way by the vglm-object@family
object.
Another thought related to this issue is if sigma
should be reported among the "performance" values of a model, when it is being estimated by the model. Probably, reporting it as until now, among the "parameters", is better, but would look even better if we could detect it by means of reading loglink(sd)
or sigma
(or some other alternative) instead of (Intercept):2
.
I think this will take some time until I (or someone else) can look into this, seems not to be very straightforward. If you are able to, we of course also appreciate PRs on this issue :-)
First, it is a bug, since, as stated in
?get_sigma
Therefore, when the model has no
sigma()
methodget_sigma
tries a calculation that not always is suitable, at least for many ofvglm
models. For example (fromhelp("tobit", package = "VGAM")
):while the right
sigma
value is(see https://stats.oarc.ucla.edu/r/dae/tobit-models/ and the book "Vector Generalized Linear and Additive Models" by Thomas Yee). In fact, in this package there are 2 points to consider:
sigma
is not treated as a scale parameter, like in GLM's, but it is generally (not always) modelled (as also happens withgamlss
package), therefore it is found as an intercept;sigma
maybe a list, with each element being a matrix corresponding to the multivariate responses (using the intercepts(Intercept):2
,(Intercept):4
, etc.)? (I refer to the distinction between multiple and multivariate responses made in section 3.5.1 of the book, in the second half of page 115). This second point would be related to this issue I recently added toperformance
: https://github.com/easystats/performance/issues/430Thanks!