dcousin3 / superb

Summary plots with adjusted error bars
https://dcousin3.github.io/superb
19 stars 2 forks source link

Feature Request: Coerce tibbles to data frames #2

Closed mccarthy-m-g closed 3 years ago

mccarthy-m-g commented 3 years ago

The superb package does not seem tidyverse friendly at the moment. When superbPlot() or superbData() are given a tibble as input they throw an error, but if that tibble is coerced to a data frame then the functions work as expected. Here is a reprex showing this behaviour:

library(tibble)
library(superb)

# Motivation data for 15 participants over three weeks in wide format:
tib <- matrix( c(
  45, 50,  59,
  47, 58,  64,
  53, 63,  72,
  57, 64,  81,
  58, 67,  86,
  61, 70,  98,
  61, 75, 104,
  63, 79, 100,
  63, 79,  84,
  71, 81,  96,
  72, 83,  82,
  74, 84,  82,
  76, 86,  93,
  84, 90,  85,
  90, 96,  89
), ncol=3, byrow=T)

# put column names then convert to tibble:
colnames(tib) <- c("Week 1", "Week 2", "Week 3")
tib           <- as_tibble(tib)

# Superb throws an error when the data frame is a tibble
superbPlot(tib, 
           WSFactors = "Moment(3)",
           variables = c("Week 1", "Week 2", "Week 3"),
           adjustments = list(purpose = "difference"),
           plotStyle="line"
)
#> Error: Must subset rows with a valid subscript vector.
#> ℹ Logical subscripts must match the size of the indexed input.
#> x Input has size 15 but subscript `!duplicated(x, fromLast = fromLast, ...)` has size 0.

# But if the tibble is coerced to a data frame the function works
superbPlot(as.data.frame(tib), 
           WSFactors = "Moment(3)",
           variables = c("Week 1", "Week 2", "Week 3"),
           adjustments = list(purpose = "difference"),
           plotStyle="line"
)

Created on 2021-09-04 by the reprex package (v2.0.0)

It would be nice if you supported tibbles in this package since that is a data frame format commonly used in R. This would further reduce the difficulty of obtaining the statistics calculated in this package. An easy fix would likely be to just do that coercion inside the superbPlot() and superbData() functions. I have not tested it but something like this should work:

superbData <- function(data, 
    BSFactors     = NULL,            # vector of the between-subject factor columns
    WSFactors     = NULL,            # vector of the names of the within-subject factors
    WSDesign      = "fullfactorial", # or ws levels of each variable if not a full factorial ws design
    factorOrder   = NULL,            # order of the factors for plots
    variables,                       # dependent variable name(s)
    statistic     = "mean",          # descriptive statistics
    errorbar      = "CI",            # content of the error bars
    gamma         = 0.95,            # coverage if confidence intervals
    adjustments   = list(
        purpose        = "single",   # is "single" or "difference"
        popSize        = Inf,        # is Inf or a specific positive integer
        decorrelation  = "none",     # is "CM", "LM", "CA" or "none"
        samplingDesign = "SRS"       # is "SRS" or "CRS" (in which case use clusterColumn)
    ),
    preprocessfct = NULL,            # run preprocessing on the matrix
    postprocessfct= NULL,            # run post-processing on the matrix
    clusterColumn = ""               # if samplineScheme = CRS
) {

    ##############################################################################
    # All DONE: just send this to the main function superbPlot with showPlot=FALSE
    ##############################################################################

    results <- superbPlot(data    = as.data.frame(data), 
        BSFactors      = BSFactors,
        WSFactors      = WSFactors,
        WSDesign       = WSDesign,
        variables      = variables,  
        statistic      = statistic,  
        errorbar       = errorbar, 
        gamma          = gamma, 
        factorOrder    = factorOrder,
        adjustments    = adjustments,
        clusterColumn  = clusterColumn,
        preprocessfct  = preprocessfct,
        postprocessfct = postprocessfct,
        showPlot       = FALSE
    )    
    summaryStatistics = results[[1]]
    rawData = results[[2]]

#    if(missing(factorOrder))  {factorOrder <- c(WSFactors, BSFactors)}
#    widthfct <- paste(errorbar, statistic, sep = ".")

    # do some renaming of the columns for clearer results
#    verbosecol <- c(
#        statistic,
#        if (errorbar == "SE") c("- 1 * SE", "+ 1 * SE") 
#        else if (errorbar == "CI") c(paste("-", gamma* 100, "% CI width"), paste("+", gamma* 100, "% CI width") ) 
#        else if (errorbar == "PI") c(paste("-", gamma* 100, "% PI width"), paste("+", gamma* 100, "% PI width") ) 
#        else c(paste("-", widthfct), paste("+", widthfct) )
#    )
#    colnames(summaryStatistics)[(length(factorOrder)+1):(length(factorOrder)+3)] <- verbosecol

    return(list(summaryStatistics = summaryStatistics, rawData = rawData) )

}
dcousin3 commented 3 years ago

Excellent suggestion! The github version has been updated to handle tibble and any other structure that can be coerce to data.frame.

In R, run devtools::install_github("dcousin3/superb") to upload this new version.

Let me know if anything.

mccarthy-m-g commented 3 years ago

Awesome. Thanks for the quick fix!