crsh / papaja

papaja (Preparing APA Journal Articles) is an R package that provides document formats to produce complete APA manuscripts from RMarkdown-files (PDF and Word documents) and helper functions that facilitate reporting statistics, tables, and plots.
https://frederikaust.com/papaja_man/
Other
651 stars 132 forks source link

Functionality for lavaan output #181

Open doomlab opened 6 years ago

doomlab commented 6 years ago

I am willing to help work on creating functionality to transform lavaan type outputs into printable outputs for articles using apa_print. I wanted to get input on what people would like to be able to print out from lavaan.

If I think about what I'd want, maybe something of the following:

Thoughts are welcome!

steve-itb commented 6 years ago

Hi,

I just wanted to thank you for providing the videos on youtube for CFA. It really helped with my understanding of the topic. I have been playing around with CFA for the last couple of weeks and below might be useful for the loadings. Its been cobbled together from blogs over the internet. I have a few other functions to visualize residuals and create fit statistics for MGFA; (milage may vary).

The final table is only viewable in Rmarkdown in the code snippet below

library(lavaan)
library(tidyverse)
library(glue)
library(kableExtra)

generate_cfa_loadings <- function(myfit, mygroup){

  estimates <- parameterestimates(myfit, ci = TRUE, level = 0.95, standardized = TRUE) %>%   
    filter(op == "=~") %>% 
    mutate(stars = ifelse(pvalue < .001, "***", 
                          ifelse(pvalue < .01, "**", 
                                 ifelse(pvalue < .05, "*", "")))) %>%
    as_tibble() %>%
    mutate_if(is.numeric, round, digits=2) %>% 
    mutate_if(is.numeric, format, nsmall = 2)%>% 
    mutate( est = glue("{est} CI({ci.lower}-{ci.upper})"),
            Beta = glue("{std.all}{stars}"))%>% 
    dplyr::select(question=rhs,
                  est = est,
                  Beta)

  names(estimates)[c(2,3)] <- paste(mygroup, names(estimates)[c(2,3)], sep = '-')

  estimates

}

cfa <- ' 
visual  =~ x1 + x2 + x3
txtual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9 '

# Generate the Lavaan model
fit <- HolzingerSwineford1939 %>% cfa(cfa, data = ., meanstructure = TRUE, std.lv = TRUE, estimator = 'MLR')

# Pull the loadings into a dataframe
cfa_loadings <- generate_cfa_estimates(fit, "Demo")

# Set the names of the dataframe
col_names <- c("Question", "Est(95\\% CI)","$\\beta$")

knitr::kable(cfa_loadings, col.names = col_names, escape=FALSE, digits = 3, format = "latex", booktabs = TRUE,
             caption = "Loadings for each item on each construct per sample") %>%
  kable_styling(latex_options = c("repeat_header", "hold_position", "scale_down"), position = 'center') %>% 
  group_rows('visual',1,3) %>% 
  group_rows('txtual',4,6) %>% 
  group_rows('speed',7,9) %>% 
  add_footnote("Note: *** indicates that estimates are significant at 0.001", notation = "alphabet")

Thanks Steve

crsh commented 6 years ago

Hi Erin, thanks for getting the conversation on this started. And thanks for your code snippet Steve.

From my experience it useful to first think about what needs to be reported. What information do we need and what should the table(s) look like. Ideally, we won't have to make this up ourselves but can follow/adapt some best practice guidelines or APA examples?

Next we should discuss how to structure the output. The structure should be consistent with other apa_print()-methods, such as apa_print.lm() or apa_print.list().

The general structure of output is currently a list with four elements:

papaja:::apa_print_container()
$estimate
NULL

$statistic
NULL

$full_result
NULL

$table
NULL

Do you think we can fit every piece of information into one of those four categories? I think we should include loadings and CI as separate information for in-text reporting and tables; this may be redundant but really is only a small extra step, grants more flexibility, and would be consistent with other apa_print()-methods. For formatting of CI see papaja:::print_confint().

In the lm-method, I have placed fit indices into the estimate list:

tl <- c(4.17, 5.58, 5.18, 6.11, 4.50, 4.61, 5.17, 4.53, 5.33, 5.14)
trt <- c(4.81, 4.17, 4.41, 3.59, 5.87, 3.83, 6.03, 4.89, 4.32, 4.69)
group <- gl(2, 10, 20, labels = c("Ctl", "Trt"))
weight <- c(ctl, trt)
lm_fit <- lm(weight ~ group)
apa_print(lm_fit)$estimate$modelfit
$r2
[1] "$R^2 = .07$, 90\\% CI $[0.00$, $0.33]$"

$r2_adj
[1] "$R^2_{adj} = .02$"

$aic
[1] "$\\mathrm{AIC} = 46.18$"

$bic
[1] "$\\mathrm{BIC} = 49.16$"

It seems to me the same thing would work for lavaan objects. Formatting of the numbers should be straightforward with printnum().

I have also built a method for model comparisons for lm objects that may serve as inspiration:

mod1 <- lm(Sepal.Length ~ Sepal.Width, data = iris)
mod2 <- update(mod1, formula = . ~ . + Petal.Length)
mod3 <- update(mod2, formula = . ~ . + Petal.Width)
apa_print(list(Baseline = mod1, Length = mod2, Both = mod3), boot_samples = 0)
$estimate
$estimate$Length
[1] "$\\Delta R^2 = .83$"

$estimate$Both
[1] "$\\Delta R^2 = .02$"

$statistic
$statistic$Length
[1] "$F(1, 148) = 853.31$, $p < .001$"

$statistic$Both
[1] "$F(1, 148) = 19.04$, $p < .001$"

$full_result
$full_result$Length
[1] "$\\Delta R^2 = .83$, $F(1, 148) = 853.31$, $p < .001$"

$full_result$Both
[1] "$\\Delta R^2 = .02$, $F(1, 148) = 19.04$, $p < .001$"

$table
                                         Baseline                  Length                       Both
Intercept                 $6.53$ $[5.58$, $7.47]$ $2.25$ $[1.76$, $2.74]$    $1.86$ $[1.36$, $2.35]$
Sepal Width             $-0.22$ $[-0.53$, $0.08]$ $0.60$ $[0.46$, $0.73]$    $0.65$ $[0.52$, $0.78]$
Petal Length                                      $0.47$ $[0.44$, $0.51]$    $0.71$ $[0.60$, $0.82]$
Petal Width                                                               $-0.56$ $[-0.81$, $-0.30]$
$R^2$ [90\\% CI]           $.01$ $[0.00$, $0.06]$  $.84$ $[0.79$, $0.87]$     $.86$ $[0.82$, $0.89]$
$F$                                          2.07                  386.39                     295.54
$df_1$                                          2                       3                          4
$df_2$                                        148                     147                        146
$p$                                          .152                  < .001                     < .001
$\\mathrm{AIC}$                            371.99                  101.03                      84.64
$\\mathrm{BIC}$                            381.02                  113.07                      99.70
$\\Delta R^2$                                                       $.83$                      $.02$
$F$                                                                853.31                      19.04
$df_1$                                                                  1                          1
$df_2$                                                                148                        148
$p$                                                                < .001                     < .001
$\\Delta \\mathrm{AIC}$                                           -270.97                     -16.38
$\\Delta \\mathrm{BIC}$                                           -267.96                     -13.37

As you can see, the $table element currently contains one big table including regression coefficients, fit and incremental fit indices and tests. Depending on the complexity of the lavaan output this information could be split into separate (but maybe joinable) tables. It shouldn't be a problem if that functionality comes form another package. This may just require additional S3 methods depending on the object class returned by the functions used to perform the analyses.

doomlab commented 6 years ago

Fantastic guys! I need to review the S3 methods to really think about how to convert this output in the same way that @crsh is doing. Also wanted to comment and save this link for later. https://github.com/df1234/RFunctions_SEM

jorgesinval commented 6 years ago

I have produced some tables already for this kind of objects, mainly for measurement invariance, betas, and lambdas values.

seonghobae commented 5 years ago

Does APA have a standard reporting table form around structural equation model?

crsh commented 5 years ago

Great question, I'd also be interested in best practices and recommendations for reporting SEM and CFA as template for the lavaan-related methods for apa_print().

jorgesinval commented 5 years ago

I think that your answer is here:

Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M. (2018). Journal article reporting standards for quantitative research in psychology: The APA publications and Communications Board task force report. American Psychologist, 73(1), 3–25. https://doi.org/10.1037/amp0000191

bjorn-persson commented 3 years ago

Asked @crsh via email how to add parentheses to a table and he was quick to respond. I showed him my results and he asked me to add on to this topic, so here we are.

I won't comment much on what reporting standards are reasonable for SEM. There are plenty of them out there but it's tricky to follow a standard because reviewers always what their own stuff. Some people seem to have their favorite fit indices. In any case, I've produced a little table of fit indices that I tend to use and also combined it with a recent development for judging fit indices called dynamic fit (shoutout to @melissagwolf). That R package is very new and a work in progress so perhaps it's not a good idea to implement that part yet. I don't know. Won't say too much about how dynamic fit works, but you can find more info about it here: https://dynamicfit.app/connect/ and here: https://psyarxiv.com/v8yru/

My R skills are fairly basic, so there are some stuff that needs to be generalized if implemented properly, but that goes beyond my mere abilities. The code below is a mwe and should produce the table shown at the end.

library("lavaan")
library("dynamic") #devtools::install_github("melissagwolf/dynamic")

m1 <- 'visual  =~ x1 + x2 + x3 + x4'
m2 <- 'textual =~ x4 + x5 + x6 + x1'
m3 <- 'speed   =~ x7 + x8 + x9 + x5'

#fit models with lavaan
model.list <- list()
model.list$fit.m1 <- cfa(m1, data=HolzingerSwineford1939, estimator="ML")
model.list$fit.m2 <- cfa(m2, data=HolzingerSwineford1939, estimator="ML")
model.list$fit.m3 <- cfa(m3, data=HolzingerSwineford1939, estimator="ML")

#run dynamic fit
dynamic.list <- list()
dynamic.list$dfit.m1 <- dynamic::cfaOne(model = model.list$fit.m1, plot = FALSE) #note that the command differs depending on the number of factors
dynamic.list$dfit.m2 <- dynamic::cfaOne(model = model.list$fit.m2, plot = FALSE)
dynamic.list$dfit.m3 <- dynamic::cfaOne(model = model.list$fit.m3, plot = FALSE)

fitindices <- c("chisq", "df", "pvalue", "tli", "cfi", "rmsea", "srmr") 

In the last line above, note that if you used a robust estimator in lavaan (e.g., "MLM", "WLSMV") the names of some of the indices need to be changed (e.g., "cfi.robust").

fit.table <- data.frame(do.call(rbind,lapply(model.list,function(data){fitMeasures(data,fitindices)})), row.names = c("Visual", "Textual", "Speed"))
dynamic.table <- data.frame(do.call(rbind,lapply(dynamic.list, function(x) x$output$Cutoffs[1,])))

This needs adjustment as the output gets more complicated with multiple factors. Even in this case it prints a "NONE" which is arguably undesirable.

fit.table$pvalue <- printp(fit.table$pvalue)
my_table <- printnum(fit.table, digits=c(2,0,3,3,3,3,3)) # round to two digits etc.

my_table$cfi <- paste0(my_table$cfi, " (", dynamic.table$CFI, ")") #add dynamic fit into parentheses 
my_table$srmr <- paste0(my_table$srmr, " (", dynamic.table$SRMR, ")")
my_table$rmsea <- paste0(my_table$rmsea, " (", dynamic.table$RMSEA, ")")

#Create final table:
apa_table(
  my_table
  , col.names=c("Model", "$\\chi^2$", "$df$", "$p$", "$TLI$", "$CFI$", "$RMSEA$","$SRMR$")
  , align=c("l", rep("c", 7))
  , caption="Model Fit for Misspecified Models of HolzingerSwineford1939 Data"
  , note="CFI = Comparative fit index, TLI = Tucker-Lewis index, RMSEA = Root Mean Square Error of Approximation, SRMR = Standardized Root Mean Squared Residual, WRMR = Weighted Root Mean Square Residual. Values within parentheses are dynamic fit cutoffs: At this cutoff value, 95% of misspecified models will be correctly rejected, while only 5% of correctly specified models will be incorrectly rejected.")

image

Hope this helps someone :)

crsh commented 3 years ago

Thanks so much for sharing, Björn!