Labo-Lacourse / StepMixR

R interface to python package StepMix
1 stars 0 forks source link

Binary Structural #4

Closed mohrezazali closed 7 months ago

mohrezazali commented 7 months ago

Hello I fitted a binary structural model. Then I need to find 1. intercept and 2. coefficients of the linear model. Is there a way to catch this info? I didn't find a workaround in the the docs

sohrabjaferian commented 7 months ago

I had exactly the same issue. I guess it's not available.

giguerch commented 7 months ago

The parameter are given as the conditional probability in each group. You could rewrite it as an intercept and coefficients. For example, if we fit a 2 group model and the structural part is binary:

X = rmvnorm(100, sigma = matrix(c(1,0.5,0.5,1), 2, 2)) + 
  matrix(rep(rep(c(2,3), c(50, 50)),2), ncol = 2)
y = rbinom(100, 1, rep(c(0.5,0.8), c(50, 50)))
mod1 <- stepmix(n_components = 2, measurement = "gaussian", structural = "binary",
                random_state = 1235)
fit1 <- fit(mod1, X, y)
fit1$get_parameters()

The output is p(y=1|g=0) = 0.7089605 and p(y=1|g=1) = 0.5983974. We could rewrite it as an intercept in group 0 $\beta_0 = log(p(y=1|g=0)/(1 - p(y=1|g=0))$ and the coefficient in group 1 $\beta_1 = log(p(y=1|g=1)/(1 - p(y=1|g=1)) - \beta_0$

Here's the R code for that:

pis <- fit1$get_parameters()$structural$pis[,1]

int = log(pis[1]/(1-pis[1]))
beta = log(pis[2]/(1-pis[2]))-int

c("Intercept" = int, "Beta(group 1)" = beta)
giguerch commented 7 months ago

Thank you for your comments. It will help our team improve the package. The code behind stepmixr was written in python by programmers in the field of machine learning. They provided strong and efficient coding but in the spirit of making good predictions and not necessarily for statistical inference . The R package is an interface to that package. It's possible to do inferences using bootstrap but you have to do a little bit of reparameterization. We are working on more extensive examples to make it easier for users of stepmixr.

mohrezazali commented 7 months ago

The parameter are given as the conditional probability in each group. You could rewrite it as an intercept and coefficients. For example, if we fit a 2 group model and the structural part is binary:

X = rmvnorm(100, sigma = matrix(c(1,0.5,0.5,1), 2, 2)) + 
  matrix(rep(rep(c(2,3), c(50, 50)),2), ncol = 2)
y = rbinom(100, 1, rep(c(0.5,0.8), c(50, 50)))
mod1 <- stepmix(n_components = 2, measurement = "gaussian", structural = "binary",
                random_state = 1235)
fit1 <- fit(mod1, X, y)
fit1$get_parameters()

The output is p(y=1|g=0) = 0.7089605 and p(y=1|g=1) = 0.5983974. We could rewrite it as an intercept in group 0 β0=log(p(y=1|g=0)/(1−p(y=1|g=0)) and the coefficient in group 1 β1=log(p(y=1|g=1)/(1−p(y=1|g=1))−β0

Here's the R code for that:

pis <- fit1$get_parameters()$structural$pis[,1]

int = log(pis[1]/(1-pis[1]))
beta = log(pis[2]/(1-pis[2]))-int

c("Intercept" = int, "Beta(group 1)" = beta)

Thanks for your explainations. So if we have 3 features X1,...X3 and I want to build regression lines of each group in a 2-component case, each time we need to use one feature to derive interception and use the other two to derive the coefficients? Could you explain this too?

giguerch commented 7 months ago

Yes, you understand correctly. You can use, the group that fits you. Here find a function that will compute this for you. Maybe we will add it to the package in future versions.

### Example of probability.
p <- c(0.71, 0.60, 0.80)

### Function to reparameterized probability into logit.
p2logit <- function(p, ref=1){
    if(ref > length(p))
        stop("reference group is higher than the number of group")
    ## 1. Change p to odds.
    odds <- p/(1-p)
    ## 2. Set the reference group.
    odds_ref <- odds[ref]
    ## 3. set the coefficients and intercept.
    intercept <- log(odds_ref)
    coef <- log(odds/odds_ref)[-ref]
    res <- c(intercept, coef)
    names(res) <- c(sprintf("Int(grp=%d)", ref),
                    sprintf("Beta(grp=%d)", (1:length(p))[-ref]))
    res
}

### Change to logit using default 1st group as reference.
p2logit(p)

### Choosing last group instead as a reference.
p2logit(p, 3)

which will give.

> p
Group1 Group2 Group3 
  0.71   0.60   0.80 
> 
> ### Change to logit using default 1st group as reference.
> p2logit(p)
 Int(grp=1) Beta(grp=2) Beta(grp=3) 
  0.8953840  -0.4899189   0.4909103 
> 
> ### Choosing last group instead as a reference.
> p2logit(p, 3)
 Int(grp=3) Beta(grp=1) Beta(grp=2) 
  1.3862944  -0.4909103  -0.9808293 
sachaMorin commented 7 months ago

Thanks for the comments everyone! @giguerch if we think this could be useful in the core Python package, we should consider opening an issue in the Python repo describing the requested feature with a link to this thread.