benjaminrich / table1

79 stars 26 forks source link

Wrong means, medians #96

Open oleksii-nikolaienko opened 1 year ago

oleksii-nikolaienko commented 1 year ago

Hi, I get a very strange output for means, medians and max values in this example:

table1::table1(~ x | y, data.frame(x=c(2000:2009), y="Y"))

image

While the following works correctly:

table1::table1(~ x | y, data.frame(x=c(0:9), y="Y"))

image

It seems to have something to do with rounding when I try to use other ranges of x: c(10, 19), c(100, 109), c(1000, 1009). I think the output is very wrong when it comes to years (as I have in the first example)

benjaminrich commented 1 year ago

Please have a look at #24.

There is nothing wrong with the calculations, just increase the number of significant digits to 5 (default is 3).

table1::table1(~ x | y, data.frame(x=c(2000:2009), y="Y"), digits=5)

(Aside: While it seems strange to me that you would want to take the the mean of calendar years, I guess you know what you are doing)

oleksii-nikolaienko commented 1 year ago

Yes, sorry for that, I didn't know default value for digits=3. Can see it specified in signif_pad(), but not in table1(). On my issue: in this case I wanted to know median and min/max, and instead of 2005 [2001, 2009] got 2010 [2000, 2010]. And pasted it right into my manuscript without checking with summary(). So while I understand that there is no bug here, it is still a bit misleading, especially given the different default for number of digits... Made me think how many times I could make the same mistake before...

oleksii-nikolaienko commented 1 year ago

I get your point on the default values, but some people get confused. Would it be possible to specifically explain this better in the table1() help page?

benjaminrich commented 1 year ago

Thanks. I will take in under consideration for the next release.

ffmed commented 2 weeks ago

Hey, in medical trials you do encounter the median year for treatment a vs b occassionally. A simple fix I use is to just customize the digits depending on the value:

 render_cont.ff <- function(x, name, data2, ...) {

  MIN <- min(x, na.rm = T)
  MAX <- max(x, na.rm = T)
  median <- median(x, na.rm = T)
  Q1 <- quantile(x, 0.25, na.rm = T)
  Q3 <- quantile(x, 0.75, na.rm = T)
  N = length(x) - sum(is.na(x))
  if (median >1000)
  {
    sprintf("%s (%s, %s)",
            signif_pad(median, 4, big.mark=","),
            signif_pad(Q1,    4, big.mark=","),
            signif_pad(Q3,    4, big.mark=","))
  }
  else
  {
    sprintf("%s (%s, %s)",
            signif_pad(median, 3, big.mark=","),
            signif_pad(Q1,    3, big.mark=","),
            signif_pad(Q3,    3, big.mark=","))
  }  

}