data-edu / tidyLPA

Easily carry out Latent Profile Analysis (LPA) using open-source or commercial software
https://data-edu.github.io/tidyLPA/
Other
56 stars 16 forks source link

configure missforest and imputeData via ... in `single_imputation()` #166

Open Teebusch opened 3 years ago

Teebusch commented 3 years ago

I'd like to add a ... argument to single_imputation() so that the imputation methods can be configured (e.g., number of iterations for missForest).

Current (R/missing_data.R)

single_imputation <- function(x, method = "imputeData") {
  if (all(complete.cases(x))) {
    return(x)
  }
  if (FALSE) missForest(x)                   # SIDENOTE: This is safe to remove....
  imputed <- invisible(switch(method,        # ...and so is the `invisible()`, I believe
    "imputeData" = do.call(imputeData, list(data = x, verbose = FALSE)),
    "missForest" = do.call(missForest::missForest, list(xmis = as.matrix(x)))$ximp,
    NULL
  ))
  if (is.null(imputed)) {
    stop("No method is currently defined for single imputation of data using '", method, "'.")
  }
  data.frame(imputed)
}

Suggested

single_imputation <- function(x, method = "imputeData", ...) {
  if (all(complete.cases(x))) return(x)

  imputed <- switch(
    method,
    "imputeData" = mclust::imputeData(x, verbose = FALSE, ...),
    "missForest" = missForest::missForest(as.matrix(x), ...)$ximp,
    NULL
  )

  if (is.null(imputed)) {
    stop(sprintf("Unknown imputation method '%s'.", method))
  }

  data.frame(imputed)
}

Quick Test

single_imputation(psych::bfi, "missForest", maxiter = 1, ntree = 10)

I can't do a proper PR right now, but would have time in the next few days, I think.

cjvanlissa commented 3 years ago

I agree, and the function should be rewritten so that the method argument can be a function, which is evaluated with the ... arguments. The current interface, where a text string is passed, should only be retained for the sake of backward compatibility.

Here is an example for another function in which I use this interface, you can pilfer code from there:

https://github.com/cjvanlissa/worcs/blob/63f3f4656524d0679b3220f2617126da38c1956d/R/synthetic.R#L153

@jrosen48 please check, this contribution by @Teebusch would probably warrant being included as ctb to the package author list.

Teebusch commented 3 years ago

Thank you for the quick reply!

I like the idea of allowing to provide a function as argument to method. What's nice about the current version with character arguments, though, is that for missForest() it takes care of converting between data frame and matrix, and plucks the imputations from the returned list. For me, this is the main reason to use it instead of calling missForest() directly. So both options should be available, I think.

Maybe something like this (not properly tested):

#' @param method Imputation function to use. Can be a character string or a
#' function call.
single_imputation <- function(x, method = "imputeData", ...) {
  if (all(complete.cases(x))) {
    return(x)
  }

  if (is.character(method)) {
    imputed <- switch(
      method,
      "imputeData" = mclust::imputeData(x, verbose = FALSE, ...),
      "missForest" = missForest::missForest(as.matrix(x), ...)$ximp,
      NULL
    )
    if (is.null(imputed)) {
      stop(sprintf("Unknown imputation method '%s'.", method))
    }
    imputed <- data.frame(imputed)

  } else if (is.function(method)) {
    imputed <- do.call(method, list(x, ...))

    if (!inherits(imputed, "data.frame") || !all.equal(dim(x), dim(imputed))) {
      warning(
          paste(
            "The imputation function did not return a data frame",
            "or a data frame with different dimensions than the data",
            "Please check output carefully!"
          )
      )
    }
  }
}