cjvanlissa / tidySEM

56 stars 7 forks source link

growth models - bad node labels #19

Closed mattansb closed 3 years ago

mattansb commented 3 years ago
library(lavaan)
library(tidySEM)

model.syntax <- '
  # intercept and slope with fixed coefficients
    i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
    s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4

  # regressions
    i ~ x1 + x2
    s ~ x1 + x2
'

fit <- growth(model.syntax, data = Demo.growth)

lay <- get_layout(
  "t1", "t2", "t3", "t4",
  NA, "i", "s", NA,
  rows = 2
)

graph_sem(fit, layout = lay)

image

nodes(prepare_graph(fit))
#>   name shape   label  x y node_xmin node_xmax node_ymin node_ymax lhs op rhs   est   se pval        confint est_sig est_std se_std pval_std
#> 1    i  oval 0.59***  8 4       7.5       8.5       3.5       4.5   i ~1      0.59 0.06 0.00   [0.46, 0.71] 0.59***    0.42   0.05     0.00
#> 2    s  oval 0.96***  2 2       1.5       2.5       1.5       2.5   s ~1      0.96 0.03 0.00   [0.90, 1.02] 0.96***    1.26   0.06     0.00
#> 3   t1  rect    0.00  4 2       3.4       4.6       1.6       2.4  t1 ~1      0.00 0.00 <NA>   [0.00, 0.00]    0.00    0.00   0.00     <NA>
#> 4   t2  rect    0.00  6 2       5.4       6.6       1.6       2.4  t2 ~1      0.00 0.00 <NA>   [0.00, 0.00]    0.00    0.00   0.00     <NA>
#> 5   t3  rect    0.00  8 2       7.4       8.6       1.6       2.4  t3 ~1      0.00 0.00 <NA>   [0.00, 0.00]    0.00    0.00   0.00     <NA>
#> 6   t4  rect    0.00 10 2       9.4      10.6       1.6       2.4  t4 ~1      0.00 0.00 <NA>   [0.00, 0.00]    0.00    0.00   0.00     <NA>
#> 7   x1  rect   -0.09 12 2      11.4      12.6       1.6       2.4  x1 ~1     -0.09 0.00 <NA> [-0.09, -0.09]   -0.09   -0.09   0.00     <NA>
#> 8   x2  rect    0.14 14 2      13.4      14.6       1.6       2.4  x2 ~1      0.14 0.00 <NA>   [0.14, 0.14]    0.14    0.14   0.00     <NA>
#>      confint_std est_sig_std show
#> 1   [0.33, 0.52]     0.42*** TRUE
#> 2   [1.13, 1.38]     1.26*** TRUE
#> 3   [0.00, 0.00]        0.00 TRUE
#> 4   [0.00, 0.00]        0.00 TRUE
#> 5   [0.00, 0.00]        0.00 TRUE
#> 6   [0.00, 0.00]        0.00 TRUE
#> 7 [-0.09, -0.09]       -0.09 TRUE
#> 8   [0.14, 0.14]        0.14 TRUE
mattansb commented 3 years ago

Something very weird is going on here...:

model.syntax <- '
  # intercept and slope with fixed coefficients
    i =~ 1*t1 + 1*t2 + 1*t3 + 1*t4
    s =~ 0*t1 + 1*t2 + 2*t3 + 3*t4

    # time-varying covariates
    t1 ~ c1
    t2 ~ c2
    t3 ~ c3
    t4 ~ c4

  # regressions
    i ~ x1 + x2
    s ~ x1 + x2
'

fit <- growth(model.syntax, data = Demo.growth)

lay <- get_layout(
  "c1", "c2", "c3", "c4",
  "t1", "t2", "t3", "t4",
  NA, "i", "s", NA,
  "x1", NA, NA, "x2",
  rows = 4
)

graph_sem(fit, layout = lay) # TOO MANY ARROWS!

image

# does not match:
> summary(fit)
lavaan 0.6-7 ended normally after 31 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of free parameters                         17

  Number of observations                           400

Model Test User Model:

  Test statistic                                26.059
  Degrees of freedom                                21
  P-value (Chi-square)                           0.204

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  i =~                                                
    t1                1.000                           
    t2                1.000                           
    t3                1.000                           
    t4                1.000                           
  s =~                                                
    t1                0.000                           
    t2                1.000                           
    t3                2.000                           
    t4                3.000                           

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  t1 ~                                                
    c1                0.143    0.050    2.883    0.004
  t2 ~                                                
    c2                0.289    0.046    6.295    0.000
  t3 ~                                                
    c3                0.328    0.044    7.361    0.000
  t4 ~                                                
    c4                0.330    0.058    5.655    0.000
  i ~                                                 
    x1                0.608    0.060   10.134    0.000
    x2                0.604    0.064    9.412    0.000
  s ~                                                 
    x1                0.262    0.029    9.198    0.000
    x2                0.522    0.031   17.083    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
 .i ~~                                                
   .s                 0.075    0.040    1.855    0.064

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .t1                0.000                           
   .t2                0.000                           
   .t3                0.000                           
   .t4                0.000                           
   .i                 0.580    0.062    9.368    0.000
   .s                 0.958    0.029   32.552    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .t1                0.580    0.080    7.230    0.000
   .t2                0.596    0.054   10.969    0.000
   .t3                0.481    0.055    8.745    0.000
   .t4                0.535    0.098    5.466    0.000
   .i                 1.079    0.112    9.609    0.000
   .s                 0.224    0.027    8.429    0.000
cjvanlissa commented 3 years ago

I got a handle on this. It's all related to the label column now ending up in the edges and nodes element. Those "extra" paths are actually there in the model; they are observed, not estimated, and the lavaan print function suppresses them. I'm now suppressing them as default as well, using the new show column

mattansb commented 3 years ago

Is this also what's going on in #20?

mattansb commented 3 years ago

Labels are still wonky :/

(feel free to mark this comment duplicate if you think)

image

cjvanlissa commented 3 years ago

What's wrong with the labels here?

mattansb commented 3 years ago

The node labels have edge data in them :(

cjvanlissa commented 3 years ago

I don't think so? The default label argument for get_nodes() is paste(name, est_sig, sep = "\n"). That looks like what is depicted here..

mattansb commented 3 years ago

For the nodes? I've never seen edge values appearing inside/as part of the node label. Is that common?

Also:

  1. Some nodes have multiple arrows going into them - how does it know to choose?
  2. Why don't edge values appear in node labels in other cases? e.g.:
library(tidySEM)
library(lavaan)

fit <- sem("mpg ~ am
           hp ~ am + mpg", data = mtcars)

graph_sem(fit)

image

mattansb commented 3 years ago

Oh... I see - it's because these are the means!

cjvanlissa commented 3 years ago

As described in the help for get_nodes:

"The function get_nodes identifies all dependent and independent variables in the model as nodes. If a mean structure / intercepts are included in the model, the output of table_results for those means / intercepts is used to label the nodes."

Most of your examples do not have a meanstructure, so the package uses the default label, which is just the node name. After implementing the new interface you requested, where people can specify an expression for the node labels, I've moved the hard-coded label to the default argument.

mattansb commented 3 years ago

Got it!

However, I seem to be unable to change this (I am running the dev version):

library(lavaan)
library(tidySEM)

HS.model <- '  visual =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '
fit <- cfa(HS.model, 
           data = HolzingerSwineford1939, 
           group = "school")

graph_sem(fit, nodes = get_nodes(fit, label = "name"))

image

get_nodes(fit, label = "name")$label
#>  [1] "speed\n0.00"   "textual\n0.00" "visual\n0.00"  "x1\n4.94***"   "x2\n5.98***"   "x3\n2.49***"   "x4\n2.82***"   "x5\n4.00***"  
#>  [9] "x6\n1.92***"   "x7\n4.43***"   "x8\n5.56***"   "x9\n5.42***"   "speed\n0.00"   "textual\n0.00" "visual\n0.00"  "x1\n4.93***"  
#> [17] "x2\n6.20***"   "x3\n2.00***"   "x4\n3.32***"   "x5\n4.71***"   "x6\n2.47***"   "x7\n3.92***"   "x8\n5.49***"   "x9\n5.33***"  
mattansb commented 3 years ago

(See this just shows how little I actually use SEM outside of teaching [I hope my students don't find this thread 😅])

cjvanlissa commented 3 years ago

If they do find this thread, they would get a healthy reality check that researchers are not infallible, and they would see how much dedication goes into doing research and making software!

cjvanlissa commented 3 years ago

OK, this issue is resolved.

mattansb commented 3 years ago

Awesome, thanks!