cjvanlissa / tidySEM

54 stars 7 forks source link

tidySEM and categorical/clustering/weights #32

Open cbratt opened 2 years ago

cbratt commented 2 years ago

I am struggling to make tidySEM work beyond classical SEM - by declaring a dependent variable to be categorical (ordered), or by adding information on clusters in the data, or by adding sampling weights.

None of these attempts are successful, and the manual or vignettes for tidySEM seem not to mention such models beyond classical SEM. I tidySEM limited to continuous variables and models without clustering and sampling weights?

cjvanlissa commented 2 years ago

No specific provisions have been made for these options, but in principle they can be passed on to the estimating package. Can you share some syntax of what you're trying to accomplish?

The syntax-generating capabilities of tidySEM are least developed btw, being just a shorthand for me to estimate the same model with different packages. But they all pass arguments on, so could be flexibly used.

cbratt commented 2 years ago

THERE WAS AN ERROR IN THE CODE SUBMITTED HERE.

I WILL FIRST RESOLVE THAT ERROR.

cjvanlissa commented 2 years ago

@cbratt do note that 'scale' in tidy_sem$dictionary refers to items that load on a latent variable. It is not about level of measurement (class in R). Also, I'm not familiar with the use of the pipe, so I can't check if it's used correctly.

cbratt commented 2 years ago

I am aware of the need to include a reproducible example (and usually include code). But since developing one requires some work, I wanted to make sure that there actually was a point in including syntax.

Even though searching, I didn't find a data set included in R that would make sense. So I turn the mtcars data set into a toy example.

library(tidyverse)
library(lavaan)
library(tidySEM)

# Two data frames based on mtcars
data1 <- mtcars
data2 <- mtcars

# The second data frame gets vs declared as ordered
data2$vs <- ordered(data2$vs) 

I define the same model for each data set

model1 <- tidy_sem(data1             |> 
                    select(mpg, vs)) |>
          add_paths(vs ~ mpg)

model2 <- tidy_sem(data2             |> 
                    select(mpg, vs)) |>
          add_paths(vs ~ mpg)

All well with the model using the original data

> estimate_lavaan(model1)
lavaan 0.6-9 ended normally after 12 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         3

  Number of observations                            32

Model Test User Model:

  Test statistic                                 0.000
  Degrees of freedom                                 0

But not if I use the data where vs is defined as ordinal:

> estimate_lavaan(model2)
lavaan 0.6-9 did not run (perhaps do.fit = FALSE)?
** WARNING ** Estimates below are simply the starting values

  Estimator                                       DWLS
  Optimization method                           NLMINB
  Number of model parameters                         3

  Number of observations                            32

Model Test User Model:

  Test statistic                                    NA
  Degrees of freedom                                NA
There were 14 warnings (use warnings() to see them)

I get the same problem if I use to original data but declare the dependent variable as ordinal as part of estimate_lavaan()

> estimate_lavaan(model1, ordered = "vs")
lavaan 0.6-9 did not run (perhaps do.fit = FALSE)?
** WARNING ** Estimates below are simply the starting values

  Estimator                                       DWLS
  Optimization method                           NLMINB
  Number of model parameters                         3

  Number of observations                            32

Model Test User Model:

  Test statistic                                    NA
  Degrees of freedom                                NA
There were 14 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In lav_partable_check(lavpartable, categorical = lavoptions$categorical,  ... :
  lavaan WARNING: parameter table does not contain thresholds 
2: In WLS.obs - WLS.est :
  longer object length is not a multiple of shorter object length
3: In WLS.obs - WLS.est :
  longer object length is not a multiple of shorter object length
4: In lavsamplestats@WLS.obs[[g]] - WLS.est[[g]] :
  longer object length is not a multiple of shorter object length
5: In WLS.obs - WLS.est :
  longer object length is not a multiple of shorter object length
6: In WLS.obs - WLS.est :
  longer object length is not a multiple of shorter object length
7: In lavsamplestats@WLS.obs[[g]] - WLS.est[[g]] :
  longer object length is not a multiple of shorter object length
8: In WLS.obs - WLS.est :
  longer object length is not a multiple of shorter object length
9: In WLS.obs - WLS.est :
  longer object length is not a multiple of shorter object length
10: In lavsamplestats@WLS.obs[[g]] - WLS.est[[g]] :
  longer object length is not a multiple of shorter object length
11: In WLS.obs - WLS.est :
  longer object length is not a multiple of shorter object length
12: In WLS.obs - WLS.est :
  longer object length is not a multiple of shorter object length
13: In lavsamplestats@WLS.obs[[g]] - WLS.est[[g]] :
  longer object length is not a multiple of shorter object length
14: In lavaan::lavaan(model = structure(list(lhs = c("vs",  ... : lavaan WARNING:
    Model estimation FAILED! Returning starting values.
cbratt commented 2 years ago

A related wish would be the possibility to add arguments to a tidy_sem object.

For instance something similar to

  # Original model
  model <- tidy_sem(data             |> 
                    select(mpg, vs)) |>
           add_paths(vs ~ mpg)

 # Revised model, now including sample weights and preparing for a sandwich estimator of clustered data
  model  <- as_mplus(model, "weights = sample_weights; cluster = group;")

But I fully understand that this cannot be on a priority list, given that tidySEM will not focus on Mplus in the future. This would be well beyond the intent to have functions that easy comparisons of results from Mplus and lavaan. If, however, the development of support for OpenMX requires OpenMX results to be compared with results from Mplus, then such an addition to tidySEM would make sense. (Personally, I only want to use Mplus for more advanced analyses than those currently available in lavaan.)

cjvanlissa commented 2 years ago

@cbratt I think you can use mplusObject for this!

cbratt commented 2 years ago

I close this case as @cjvanlissa indicates that the requested features are not meant to be part of tidySEM.

cjvanlissa commented 2 years ago

It's OK; just leave it open. If I see an easy way I might get to it