benjaminrich / table1

78 stars 26 forks source link

Grouped table shows incorrect numbers #128

Open murrayneil opened 3 weeks ago

murrayneil commented 3 weeks ago

Hi Benjamin,

first off I wanted to give credits for this great package. However, I just stumbled upon a weird little bug when plotting a larged sized grouped table. I was able to replicate the bug with a synthetic dataset:

set.seed(42) 

n <- 100

test_dat <- data.frame(
  year = sample(2021:2023, n, replace = TRUE),
  country = sample(c("GER", "BE", "FI", "GB"), n, replace = TRUE),
  wave = sample(1:12, n, replace = TRUE)
)

this yields a dataset with the following structure:

image

Note: the years only span between 2021 and 2023. Now I want to create a descriptive table grouped by country using:

table1::table1(~ year + wave | country, data = test_dat)

This yields the following outcome:

image

As you can see, the "year" variable indicates a mean/median of "2020" within every group even though no outcome in "year" contains a value below 2021. Funnily, if one just substracts 2000 from the year variable and then create the table again, the numbers are correct:

test_dat$year_sub <- test_dat$year - 2000

table1::table1(~ year_sub + wave | country, data = test_dat)

image

All the best,

Neil