Accessing parameter estimates + including binary and categorical variables as covariates

huentelb commented 2 months ago

Hi all,

thanks for this amazing initiative to make the use of 2- and 3-step LCA available on open-source platforms. I was able to fit some basic 1-, 2-, and 3-step LCA but I have two remaining questions:

Stored parameter estimates. (How) is it possible to access parameter estimates after fitting the model? For instance, I fit a 3-step LCA with ML correction including covariates and would like to build a nice output table from my structural model including beta coefficients, SEs, p-values and CIs (thus, basically mirroring the output of a classic multinomial logistic regression (step 3)). Are these estimates stored anywhere? How can I access them? It seems as if this is fairly easily possible in Python but I could not find a solution in R, yet.
Different scales of covariates I would like to estimate a 3-step LCA with ML correction and the inclusion of covariates that have all sorts of scales, that is, binary, continuous, and categorical. Including all covariates via
```
lc4_3step_Zp.df <- USA_lc %>%
dplyr::select(anc_age,  # continuous
            female, mig,  # binary
            kin_cat, race, lfs, anc_eduall) # categorical
```

covariate_params = list( method = 'newton-raphson', max_iter = as.integer(1), intercept = TRUE )

m3 <- stepmix(n_components = 4, n_steps = 3, correction = "ML", measurement = "binary", structural = "covariate", structural_params = covariate_params, max_iter = m, n_init = 10, verbose = 1, random_state = seed)

resulted in all covariates being treated as continuous (as seen in the structural model parameters). Adjusting what I read in the [Tutorial](https://colab.research.google.com/drive/1MzGHRO5kfs9OT3cRICJ1Ey94PHHnxFdO#scrollTo=QxugiqGk_h-O), I tried applying the `mixed_descriptor` function to the covariates like so:

md = mixed_descriptor(data = lc4_3step_Zp.df, continuous = 1, binary = 2:3, categorical = 4:7)

and adjusted the above model through `structural = md$descriptor` like so:

m3 <- stepmix(n_components = 4, n_steps = 3, correction = "ML", measurement = "binary", structural = md$descriptor, structural_params = covariate_params, max_iter = m, n_init = 10, verbose = 1, random_state = seed)


However, it seems as if the variables included via the structural part and `md$descriptor` are not treated as covariates because not betas but 'pis' are estimated, etc. 

I hope I clearly spelled out my questions and am happy to provide more detail, if needed. Any feedback or help is highly appreciated!!

Thanks again for all this work and any response I may get!

Best
Bettina

FelixLaliberte commented 2 months ago

Hello Bettina,

Thank you for your interest in our package.

Regarding your first question, we have prepared examples in Python and are preparing examples in R. We will get back to you with detailed examples in StepMixR as soon as possible. In the meantime, please note that to obtain the parameters of a multinomial regression, you first need to bootstrap the parameters using the bootstrap_stats() function. For example,

model = stepmix(n_components = 3, 
                measurement = 'binary', 
                structural = 'covariate',
                n_steps = 2, 
                random_state = 42)

fit1 = fit(model, df_MM, df_SM)

bs_params = bootstrap_stats(fit1, df_MM, sd_SM, n_repetitions = 1000)

level_header = c('model', 'model_name', 'param', 'class_no', 'variable')
bs_params[['samples']][, level_header] %>% unique

Here, each bootstrapped parameter is located in bs_params$samples. The tutorial we’ll be posting on GitHub will provide a step-by-step guide on how to determine the reference class, normalize the betas (i.e., obtain regression coefficients), calculate standard errors, and derive p-values.

Regarding your question about covariate scales, please note that you should use structural = 'covariate' for all types of covariates. Additionally, covariates should be treated as numeric variables in R, which means you need to create dummies for binary or categorical covariates.

Best regards, Félix

huentelb commented 1 month ago

Dear Félix,

thank you so much for getting back to me and your response!

I was able to bootstrap the parameters and store them following your code but I receive the following error for the last line of code Error in py_get_item(x, name) : IndexError: index 7000 is out of bounds for axis 0 with size 7000. I have 7000 observations and 7 variables in the bs_params$samples.

Do you know what the issue is?

Thank you again so much and best regards Bettina

giguerch commented 1 month ago

Hi Bettina, In python the index would be from 0 to 6999. Make sure you didn't use the 7000 index. Charles-Edouard

Le mer. 11 sept. 2024, 3 h 35 a.m., Bettina Hünteler < @.***> a écrit :

Dear Félix,

thank you so much for getting back to me and your response!

I was able to bootstrap the parameters and store them following your code but I receive the following error for the last line of code Error in py_get_item(x, name) : IndexError: index 7000 is out of bounds for axis 0 with size 7000. I have 7000 observations and 7 variables in the bs_params$samples.

Do you know what the issue is?

Thank you again so much and best regards Bettina

— Reply to this email directly, view it on GitHub https://github.com/Labo-Lacourse/StepMixR/issues/6#issuecomment-2342882299, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADU52HSZ2CB7SZD2JTWWTVTZV7XDVAVCNFSM6AAAAABNEA5Q6WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBSHA4DEMRZHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>

huentelb commented 1 month ago

Dear Charles-Edouard,

thanks for your quick response! I did not indicate any index but directly followed the code proposed by Félix. How would I use indices 0 to 6999?

Thank you and best Bettina

Labo-Lacourse / StepMixR

Accessing parameter estimates + including binary and categorical variables as covariates #6