cjvanlissa / tidySEM

55 stars 7 forks source link

Bifactor analysis #73

Open Gootjes opened 1 year ago

Gootjes commented 1 year ago

This PR is a Proof of Concept. I would like to receive feedback on how to better integrate this with the tidySEM framework.

EDIT: finished

cjvanlissa commented 1 year ago

What a nice idea! I would make one top-down suggestion at this point, before getting into the details of the code, and that is to try to follow the tidySEM design philosophy as closely as possible. So the bifactor() function should work similarly to the measurement() function, and there really shouldn't be a run_bifactor() function, especially not one that only works with lavaan.

Smaller notes:

  1. Add an argument that sets the default factor name "G"
  2. Currently does not support cross loadings. Why? Should be possible with add_paths() no?
  3. "G is not an allowed scale name": See point 1. Instead, give an error that there is ALREADY a variable named G. Or change the name to "G.1" and continue as usual.
  4. Make the compute_omega function generic, so it works with arbitrary sem models and not just with lavaan. The way to do this is to have a function compute_omega.default() that accepts only exported coefficients, and to have functions like compute_omega.lavaan() grab those coefficients from lavaan output and then call the default. Also: rename to omega()?
Gootjes commented 1 year ago

I have removed run_bifactor() And created a omega.lavaan and omega.default. The latter requires two arguments that are completely different classes (parameter table of class data.frame and covariance matrix of class matrix) than those of omega.lavaan (lavaan fit object). Is this the way to go? Or should the .default implementation call the data.frame implementation?

  1. Added the default factor name argument. A custom factor name needs to be passed to both bifactor and omega().
  2. I implemented cross loadings and made sure they align with the implementation of psych.

The syntax to add a cross-loading is a mistake-prone for an end-user because the defaults of add_path are not what is needed, so it needs custom defaults. The current situation looks like the code below. As a solution, should I give the result of bifactor an custom class tidy_sem_bifactor such that I can implement a custom add_paths ?

bf %>% add_paths("a =~ c_2", "", std.lv = T,
                 auto.fix.single = T,
                 auto.fix.first = F,
                 orthogonal = T,
                 int.ov.free = F,
                 int.lv.free = F,
                 meanstructure = F) %>% run_lavaan() -> bf_fit2
  1. Fixed because of 1, but the error message wording can be debated.
  2. haven't worked on implementations outside of lavaan yet.
cjvanlissa commented 1 year ago

The syntax to add a cross-loading is a mistake-prone for an end-user because the defaults of add_path are not what is needed, so it needs custom defaults. The current situation looks like the code below. As a solution, should I give the result of bifactor an custom class tidy_sem_bifactor such that I can implement a custom add_paths ?

bf %>% add_paths("a =~ c_2", "", std.lv = T, auto.fix.single = T, auto.fix.first = F, orthogonal = T, int.ov.free = F, int.lv.free = F, meanstructure = F) %>% run_lavaan() -> bf_fit2

Not sure what the problem is... are you trying to prevent model non-identification by fixing variances of indicators that load on too many factors?

Gootjes commented 1 year ago

https://github.com/cjvanlissa/tidySEM/blob/9ada1d7be214b36a00b384a157bcf8f1c1d68d48/R/syntax-bifactor.R#L63-L75

The issue is that without any parameters to add_paths, it makes the cross-loading fixed at value 1, and estimates an intercept. I don't want either: latent variables have fixed variance 1, and no intercepts should be estimated (fixed to 0). But what happens is that the implementation of add_paths adds them again to the model for the variable mentioned in the arguments.

tidy_sem(c("a_1", "a_2", "a_3", "b_1", "b_2", "b_3", "c_1", "c_2", "c_3", "c_4")) %>% 
   bifactor() %>%
  # This adds a cross-loading
   add_paths("a =~ c_4") %>% 
   as_lavaan() %>% 
   filter(lhs == "c_4" | rhs == "c_4")
#   lhs op rhs block group free label ustart plabel
# 1   G =~ c_4     1     1    1           NA  .p10.
# 2   c =~ c_4     1     1    1           NA  .p20.
# 3 c_4 ~~ c_4     1     1    1           NA  .p30.
# 4 c_4 ~1         1     1    0            0  .p50.
# 5   a =~ c_4     1     1    0            1  .p55.   

The options to add_paths that solve this are int.ov.free = F, auto.fix.first = F. But I find it kind of a hassle that respecifying this is necessary. Maybe add_paths should reuse the options that were specified earlier in the call to measurement? An alternative solution is to make bifactor add a class to the tidy_sem object, e.g. tidy_sem_bifactor, and then:

add_paths.tidy_sem_bifactor <- function() {
  cl <- match.call()

  cl["int.ov.free"] <- FALSE
  cl["auto.fix.first"] <- FALSE

  # Copy pasted from add_paths.tidy_sem
  cl["model"] <- list(model$syntax)
  # I have to comment this line though otherwise R complains 
  # about a missing implementation for group_var for the tidy_sem_bifactor class.
  # I don't need it anyways.
  # if(!is.null(group_var(model))) cl[["ngroups"]] <- group_var(model, "ngroups")
  cl[[1L]] <- quote(add_paths)
  #Args <- c(list(model = model$syntax), as.list(match.call()[-c(1:2)]))
  model$syntax <- eval.parent(cl)
  return(model)
}
Gootjes commented 1 year ago

Maybe I am using measurement not what it was intended for, but I expect this issue to crop up again when someone uses add_paths.

Should I maybe change bifactor to include arguments for cross-loadings? Then calling add_paths of a bifactor tidy_sem is not necessary anymore: everything is set from the get go. But then I need to copy quite some code from what is already there for CFA.

Semi-related point: I can tweak tidy_sem$syntax to include an equality constraint by specifying .p1. == .p2., but this does not translate when using as_ram or as_mplus. Is there already something in place to do equality constraints in cfa models?

cjvanlissa commented 1 year ago

As per standard lavaan syntax, to specify equality constraints you could use:

F=~a*x1
F=~a*x2

I need to plan in some time to look into the issue you're experiencing with add_paths, can't speak to that now. The risk of implicit defaults (as in your add_paths.tidy_sem_bifactor solution) is that, as soon as someone has a different use case than you had in mind, they break down (ironically, this might be happening to you; I have to investigate to see). I agree that manually specifying these arguments to add_paths is awkward, but at least it's explicit!

cjvanlissa commented 1 year ago

PS The .p1. labels are, to my understanding, intended as lavaan internals.

Gootjes commented 1 year ago

As per standard lavaan syntax, to specify equality constraints you could use:

F=~a*x1
F=~a*x2

I need to plan in some time to look into the issue you're experiencing with add_paths, can't speak to that now. The risk of implicit defaults (as in your add_paths.tidy_sem_bifactor solution) is that, as soon as someone has a different use case than you had in mind, they break down (ironically, this might be happening to you; I have to investigate to see). I agree that manually specifying these arguments to add_paths is awkward, but at least it's explicit!

Haha yes it is happening to me. Yea so reimplementing some of the logic in measurement to make it apply to the bifactor situation might be the best approach then. Accidental misuse is far less likely, and bifactor models are so well defined that customisation options for the end user are not a priority.

Regarding equality constraints. Thanks for the quick reply, I was trying to do it all manually by adding rows to the syntax data.frame. I understand now that I can do:

add_paths(tidy_sem(x), "F=~a*x_1; F=~a*x_2; F=~a*x_3")

And that runs with mx and lavaan. Nice! That helps. I was trying to do add_paths on a tidy_sem with existing syntax (from measurement), but that doesn't produce valid syntax. So another reason to rewrite bifactor to do it all my way from the get go.

There is no need right now for you to invest time in reading this all. I will redesign the bifactor a little based on code you wrote for measurement and then rely heavily on add_paths to do the actual magic.

Tests already work, I can reproduce results from the psych::omegaFromSem function based on psych datasets. (Turns out that function has a slightly odd way of doing things regarding cross loadings and also mixes unstandardized loadings with correlation matrices to compute variances...).

So after this is done and I factor out tidyverse usage, I will press the request review button :)

cjvanlissa commented 1 year ago

That sounds good, thank you for being proactive and understanding! Don't forget to add yourself as "ctb" to the authors@R field in the DESCRIPTION file too.

cjvanlissa commented 8 months ago

I see you're making a commit here as well.. would you like me to incorporate this into 0.2.7 as well? If so: it still needs tests for the new functions, and I see that it's failing (for some reason) the integration tests.

Would you please make sure that these functions align as closely as possible to the interface of functions like add_paths() and measurement()?

Gootjes commented 8 months ago

The implementation is not entirely complete because I ran into this unsolved issue #74 Let's not merge this yet.

The mentioned issue is preventing me from imposing equality constraints in the case when a latent variable has only two indicators.

cjvanlissa commented 8 months ago

I can't get into this until after end March due to time constraints!

Gootjes commented 8 months ago

No problem!