wald_test failing to match valid parameter names in hypotheses

DanBennettDev commented 1 year ago

I am using BCH to relate latent profiles to distal outcomes, and wish to test differences in those distal outcomes between profiles.

I began by running mx_profiles and selecting a model

res <- mx_profiles(data=df_indices,
                  classes = 1:6,
                  variances = c("equal","equal","equal", "varying", "varying"),
                  covariances = c("zero", "equal", "varying", "zero", "varying"),
                   expand_grid = FALSE)

res_bic <- res[["free var 4"]]

Selected a model

res_bic <- res[["free var 4"]]

I then bring in outcome indicators and run BCH. I want to test effect of class membership on aux means.


df_aux <- UMIDT[,c("NS_AUT","NS_COM","NS_REL", "NS_TOT", "LTR", "VIT", "SWLS", "UMUX")]

bch_model = "
NS_AUT ~ 1
NS_REL~ 1
NS_COM~ 1
NS_TOT~ 1
LTR~ 1
VIT~ 1
SWLS~ 1
UMUX~ 1
"

aux_out <- BCH(res_bic, model = bch_model, data = df_aux)

I check parameter names

table_results(aux_pt, columns = c("label", "name", "est_sig"))

And find that e.g. class1Means.NS_AUT is class1.M[1,1], and class4 Means.UMUX is class4.M[1,7]

l_test runs fine lr_test(aux_pt, compare="M")

But the wald test throws errors

I can run a wald test comparing class1 and 2 for the first 5 (of 7) outcome indicators without error

hyp = "
class1.M[1,1] = class2.M[1,1];
class1.M[1,2] = class2.M[1,2];
class1.M[1,3] = class2.M[1,3];
class1.M[1,4] = class2.M[1,4];
class1.M[1,5] = class2.M[1,5];
"

wald_test(aux_pt, hyp)

But if I run tests for the 6th or 7th outcome indicators I get an error

hyp = "
class1.M[1,6] = class2.M[1,6];
"

Error in parse_hypothesis(varnames = varnames, hyp = hypothesis) :
Some of the parameters referred to in the 'hypothesis' matched multiple parameter names of object 'x'.
The following parameter names in the 'hypothesis' matched multiple parameters in 'x': xxxNAxxx
The parameters in object 'x' are named: xxxaxxx, xxxbxxx, xxxcxxx, xxxdxxx, xxxexxx, xxxfxxx, xxxgxxx, xxxhxxx, xxxixxx, xxxjxxx, xxxkxxx, xxxlxxx, xxxmxxx, xxxnxxx, xxxoxxx, xxxpxxx, xxxqxxx, xxxrxxx, xxxsxxx, xxxtxxx, xxxuxxx, xxxvxxx, xxxwxxx, xxxxxxx, xxxyxxx, xxxzxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx

exactly the same error if I try to test hypotheses for classes higher than 2

hyp = "
class1.M[1,1] = class3.M[1,1];
"
wald_test(aux_pt, hyp)

Error in parse_hypothesis(varnames = varnames, hyp = hypothesis) :
Some of the parameters referred to in the 'hypothesis' matched multiple parameter names of object 'x'.
The following parameter names in the 'hypothesis' matched multiple parameters in 'x': xxxNAxxx
The parameters in object 'x' are named: xxxaxxx, xxxbxxx, xxxcxxx, xxxdxxx, xxxexxx, xxxfxxx, xxxgxxx, xxxhxxx, xxxixxx, xxxjxxx, xxxkxxx, xxxlxxx, xxxmxxx, xxxnxxx, xxxoxxx, xxxpxxx, xxxqxxx, xxxrxxx, xxxsxxx, xxxtxxx, xxxuxxx, xxxvxxx, xxxwxxx, xxxxxxx, xxxyxxx, xxxzxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx, xxxNAxxx

DanBennettDev commented 1 year ago

This is due to the use of letters[i] to relabel variables and hypotheses. It brings the limitation of 26 unique variables

It can be fixed by defining a longer array of unique strings, and substituting this array

e.g. for up to 702 variables:

LETTERS702 <- c(LETTERS, sapply(LETTERS, function(x) paste0(x, LETTERS)))

parse_hypothesis <- getFromNamespace("parse_hypothesis", "bain")

db_wald_test <- function(x, hypothesis, ...){

  varnames_orig <- varnames <- names(coef(x))

  hyp_orig <- hypothesis
  if(any(grepl("\\[\\d+,\\d+\\]", varnames))){
    uniqnams <- unique(varnames)
    for(i in seq_along(uniqnams)){
      varnames[which(varnames == uniqnams[i])] <- paste0("xxx", LETTERS702[i], "xxx")
      hypothesis <- gsub(uniqnams[i], paste0("xxx", LETTERS702[i], "xxx"), hypothesis, fixed = TRUE)
    }
  }
  hyps <- parse_hypothesis(varnames = varnames, hyp = hypothesis)
  test_res <- do.call(rbind, lapply(hyps$hyp_mat, function(h){
    as.data.frame(lapply(car::linearHypothesis(x, hypothesis.matrix = h[, -ncol(h), drop = FALSE], rhs = h[, ncol(h), drop = TRUE]), `[[`, 2))
  }))
  names(test_res) <- c("df", "chisq", "p")
  out <- data.frame(Hypothesis = hyps$original_hypothesis, test_res)
  if(any(grepl("xxx.xxx", out$Hypothesis))){
    for(i in seq_along(uniqnams)){
      repthis <- paste0("xxx", LETTERS702[i], "xxx")
      if(!any(grepl(repthis, out$Hypothesis, fixed = TRUE))) next
      out$Hypothesis <- gsub(repthis, uniqnams[i], out$Hypothesis)
    }
  }
  class(out) <- c("wald_test", class(out))
  out
}

#' @method print wald_test
#' @export
print.wald_test <- function(x, ...){
  cat("Wald chi-square tests for linear hypotheses:\n")
  print.data.frame(x, ..., row.names = FALSE)
}

#' @method get_estimates MxModel
#' @export
#' @importFrom bain get_estimates
get_estimates.MxModel <- function(x, ...){
  coef(x)
}

cjvanlissa commented 1 year ago

Good catch. Although I think we need to move away from letters completely and just use numbering, e.g. x1, x2, x3 etc. Otherwise there remains an artificial cap

cjvanlissa / tidySEM

wald_test failing to match valid parameter names in hypotheses #70