Gilead-BioStats / gsm

Good Statistical Monitoring R Package
https://gilead-biostats.github.io/gsm/
Apache License 2.0
39 stars 9 forks source link

Bugfix: `Input_Rate` errors out if `dfNumerator` has 0 rows. #1894

Closed samussiah closed 1 month ago

samussiah commented 1 month ago

Expected Behavior

Input_Rate should gracefully handle both a 0-row numerator and denominator dataset. Given a 0-row numerator dataset, set numerator to 0 for all rows in the denominator dataset. Given a 0-row denominator dataset, the function should exit.

Current Behavior

Input_Rate errors out here:

  # Calculate Numerator
  dfNumerator <- dfNumerator %>%
    rename("SubjectID" = !!strSubjectCol)

  if (strNumeratorMethod == "Count") {
    dfNumerator$Numerator <- 1 # throws error if dfNumerator has 0 rows
  } else {
    dfNumerator$Numerator <- dfNumerator[[strNumeratorCol]]
  }

Possible Solution

Process dfDenominator first then right join dfNumerator onto dfDenominator (keeping SubjectID only) prior to processing dfNumerator.

Steps to Reproduce

  1. Run Input_Rate with a 0-row argument to dfNumerator.

Context (Environment)

Possible Implementation

Additional Comments

zdz2101 commented 1 month ago
devtools::install_github("Gilead-Biostats/gsm", ref = "dev")
#> Skipping install of 'gsm' from a github remote, the SHA1 (05365a31) has not changed since last install.
#>   Use `force = TRUE` to force installation
library(gsm)
dfInput <- Input_Rate(
  dfSubjects = clindata::rawplus_dm,
  dfNumerator = data.frame(subjid = character(0)),
  dfDenominator = clindata::rawplus_dm,
  strSubjectCol = "subjid",
  strGroupCol = "siteid",
  strGroupLevel = "Site",
  strNumeratorMethod = "Count",
  strDenominatorMethod = "Sum",
  strDenominatorCol = "timeontreatment"
)
#> Error in `$<-.data.frame`(`*tmp*`, "Numerator", value = 1): replacement has 1 row, data has 0

library(tibble)
dfInput <- Input_Rate(
  dfSubjects = clindata::rawplus_dm,
  dfNumerator = tibble(subjid = character(0)),
  dfDenominator = clindata::rawplus_dm,
  strSubjectCol = "subjid",
  strGroupCol = "siteid",
  strGroupLevel = "Site",
  strNumeratorMethod = "Count",
  strDenominatorMethod = "Sum",
  strDenominatorCol = "timeontreatment"
)
head(dfInput)
#> # A tibble: 6 × 6
#>   SubjectID GroupID GroupLevel Numerator Denominator Metric
#>   <chr>     <chr>   <chr>          <dbl>       <dbl>  <dbl>
#> 1 0496      5       Site               0         675      0
#> 2 1350      78      Site               0         673      0
#> 3 0539      139     Site               0         673      0
#> 4 0329      162     Site               0         673      0
#> 5 0429      29      Site               0         664      0
#> 6 1218      143     Site               0         760      0

Created on 2024-10-17 with reprex v2.1.0

What's strange is it works for empty/0 row tibbles, I was having a hard time recreating it but then realized it needs to be a proper plainn old data.frame, the "intuition" we're going for is the bottom/tibble result right?

samussiah commented 1 month ago

Ohhh, nice catch! That's an interesting divergence in data frame behavior.