I just tried using semtree with a data set that I had imported into R using haven. The resulting data.frame is a tibble data.frame. When running semtree based on a lavaan model, I was surprised that there were no splits for my data set. Only when casting the tibble to a base R data.frame, the model was partitioned correctly. When using OpenMx models, I found that tibbles result in errors. Here is an example:
library(semtree)
library(tibble)
library(OpenMx)
library(lavaan)
# For lavaan, semtree does not throw any warnings / errors when using tibble,
# but also does not run correctly:
HS.model <- ' visual =~ x1 + x2 + x3
textual =~ x4 + x5 + x6
speed =~ x7 + x8 + x9 '
HolzingerSwineford1939 <- HolzingerSwineford1939[complete.cases(HolzingerSwineford1939), ]
fit <- cfa(HS.model, data = tibble::as_tibble(HolzingerSwineford1939))
# Using tibble does not work:
tree <- semtree(model = fit,
data = tibble::as_tibble(HolzingerSwineford1939))
print(tree)
# With data.frames, the tree is built correctly
tree <- semtree(model = fit,
data = HolzingerSwineford1939,
control = semtree.control(method = "score"))
print(tree)
## For OpenMx, I get an error, but one that reports issues with P1 not being
## a factor:
set.seed(23)
N <- 1000
M <- 5
icept <- rnorm(N, 10, sd = 4)
slope <- rnorm(N, 3, sd = 1.2)
p1 <- sample(c(0, 1), size = N, replace = TRUE)
loadings <- 0:4
x <- (slope + p1 * 5) %*% t(loadings) +
matrix(rep(icept, each = M), byrow = TRUE, ncol = M) +
rnorm(N * M, sd = .08)
colnames(x) <- paste0("X", 1:M)
# Use tibble
growth.data <- tibble::as_tibble(x)
growth.data$P1 <- as.factor(p1)
manifests <- names(growth.data)[1:5]
growthCurveModel <- mxModel("Linear Growth Curve Model Path Specification",
type="RAM",
manifestVars=manifests,
latentVars=c("intercept","slope"),
mxData(growth.data, type="raw"),
# residual variances
mxPath(
from=manifests,
arrows=2,
free=TRUE,
values = c(.1, .1, .1, .1, .1),
labels=c("residual","residual","residual","residual","residual")
),
# latent variances and covariance
mxPath(
from=c("intercept","slope"),
arrows=2,
connect="unique.pairs",
free=TRUE,
values=c(2, 0, 1),
labels=c("vari", "cov", "vars")
),
# intercept loadings
mxPath(
from="intercept",
to=manifests,
arrows=1,
free=FALSE,
values=c(1, 1, 1, 1, 1)
),
# slope loadings
mxPath(
from="slope",
to=manifests,
arrows=1,
free=FALSE,
values=c(0, 1, 2, 3, 4)
),
# manifest means
mxPath(
from="one",
to=manifests,
arrows=1,
free=FALSE,
values=c(0, 0, 0, 0, 0)
),
# latent means
mxPath(
from="one",
to=c("intercept", "slope"),
arrows=1,
free=TRUE,
values=c(1, 1),
labels=c("meani", "means")
)
) # close model
# fit the model to the entire dataset
growthCurveModel <- mxRun(growthCurveModel)
# Using tibble does not work:
tree <- semtree(model = growthCurveModel,
data = growth.data)
# Using data.frame does work:
tree <- semtree(model = growthCurveModel,
data = as.data.frame(growth.data))
Thanks. For now, I will add a type check whether data is really a data.frame. I am not sure whether supporting tibbles is possible at all but I will keep it on my list.
Hi Andreas,
I just tried using semtree with a data set that I had imported into R using haven. The resulting data.frame is a tibble data.frame. When running semtree based on a lavaan model, I was surprised that there were no splits for my data set. Only when casting the tibble to a base R data.frame, the model was partitioned correctly. When using OpenMx models, I found that tibbles result in errors. Here is an example:
Best, Jannik