benjaminrich / table1

79 stars 26 forks source link

Missing column & proportionn #75

Closed PierreH94 closed 2 years ago

PierreH94 commented 2 years ago

Hello Benjamin and thank you very much for this amazing package. I've been using it for several weeks now for my MD work and thanks to all your answers I've been able to do almost everything I want.

I've got two inquiries left, maybe you could help me ?

The first one is about missing values. I'm working on a database with a bunch off missing values and it is an important information to display, but my tables are pretty long and adding it as a new row each time makes it pretty dense.
I'm actually using this rndr function :

rndr <- function(x, name, ...) { cont <- ifelse(name %in% vars.normal, "Mean (SD)", "Median (Q1 - Q3)") y <- render.default(x, name, render.continuous=cont, ...) if (is.logical(x)) { y[2] }
else { y } }

I've been able for numerical values to add it like that :
cont <- ifelse(name %in% vars.normal, "Mean (SD) / NMISS", "Median (Q1 - Q3) / NMISS")

But for non numerical values, I must admit i don't have a clue where to start and if it's possible to add it in the function ? And for NMISS, the result is display with 1 or 2 decimals, do you know if it's possible to show 0 decimals ?

II/ The second one is about %. I would like to know if there is anyway to calculate percentage only for a certain subset for some variables and for all values ?

Using the melanoma example, i'd like to know if it's possible to display something like this :

table1 (~ age + thickness | sex, data = melanoma)

Where for the age variable i would like to get the mean for all patients, but for the thickness variable i would like to get the mean only for patients where ulcer == 1 ?

So it would display something like that for the age

image

And something like that for the thickness (only the right column)

image

All of that in the same table.

Hope i'm being clear and again thank you very much for your hardwork

Pierre

PierreH94 commented 2 years ago

Hello Benjamin,

Thank you very much for your reactivity and for the NMISS part, just tried it out and it works perfectly !

benjaminrich commented 2 years ago

Hi Pierre,

Yes, I was going to say that I pushed some small changes to remove the decimals from NMISS. You could use the same approach for for binary variables, but I'm not sure where you would want display the number missing for categorical variables with more than 2 categories. Here's an example:

# Generate some simulated data
set.seed(567)
n <- 123
d <- data.frame(
    A = rnorm(n, 70, 15),
    B = runif(n, 10, 99),
    C = runif(n) > 0.5)

# Add some missing values
d$A[sample(1:n, 5)] <- NA
d$B[sample(1:n, 7)] <- NA
d$C[sample(1:n, 8)] <- NA

vars.normal <- "A"

label(d$A) <- "A, Mean (SD) / #missing"
label(d$B) <- "B, Median (Q1 - Q3) / #missing"
label(d$C) <- "C, n (%) / #missing"

rndr <- function(x, name, ...) {
    cont <- ifelse(name %in% vars.normal, "Mean (SD) / NMISS", "Median (Q1 - Q3) / NMISS")
    categ <- "FREQ (PCTnoNA%) / NMISS"
    y <- render.default(x, name, render.continuous=cont, render.categorical=categ, ...)
    if (is.logical(x)) {
        y[2]
    } else {
        y
    }
}

table1(~ A + B + C, data=d, render=rndr, render.missing=NULL)

image

For your second issue, there are some tricks you could use to get what you want. The simplest is to create a new variable that has NA for the subjects that you don't want to include in the calculation (just make sure not to show them as true missing values). For the example you gave, it could be something like this:

library(boot)
library(table1)

melanoma2 <- melanoma

melanoma2$thickness.ulcer <- with(melanoma2, ifelse(ulcer==1, thickness, NA))

label(melanoma2$age)             <- "Age (years)"
label(melanoma2$thickness.ulcer) <- "Thickness, only for patients with ulceration (mm)"

table1(~ age + thickness.ulcer | sex, data=melanoma2, render.missing=NULL)

image

If that doesn't work for your case (i.e. because you have true missing values, or need to display things differently) then there might be other approaches that would work.

PierreH94 commented 2 years ago

Thanks !

Indeed adding it as missing values and doing the calculation while excluding NA was my first thought but I do have some NA for these variables. For now I've been doing it by creating a new df with subset and merging them in excel after generating it but I do loose the formatting that your package provides ! I'd be glad to hear about your other approach if you have any idea!

benjaminrich commented 2 years ago

You could always combine both variables into a character and then separate them out again in the render function, like this:

melanoma2$ulcer.thickness <- with(melanoma2, paste(ulcer, thickness, sep=":"))

rndr <- function(x, name, ...) {
    if (name == "ulcer.thickness") {
        y <- read.table(text=paste(x, collapse="\n"), sep=":")
        x <- y[[2]][y[[1]] == 1]
    }
    render.default(x, name, ...)
}

table1(~ age + ulcer.thickness | sex, data=melanoma2, render.missing=NULL, render=rndr)

Another alternative would be to pass the indices of the rows instead of the values, and then extract the values from the full data.frame, like this:

melanoma2$thickness.indices <- 1:nrow(melanoma2)

rndr <- function(x, name, ..., mydata) {
    if (name == "thickness.indices") {
        x <- subset(mydata[x,], ulcer==1)$thickness
    }
    render.default(x, name, ...)
}

table1(~ age + thickness.indices | sex, data=melanoma2, render.missing=NULL, render=rndr, mydata=melanoma2)
PierreH94 commented 2 years ago

Thank for your return, i've tried the first one and it works wonderfully ! I've been implementing it in my pre-existing rndr function and the only thing I can't figure out is how to display this new result on the same line. I've been using the tips you gave on another topic but I don't see how i can apply it to this new part of the function.

Here is the code I've got for now :

var_dep <- c("example")

rndr <- function(x, name, ...) { cont <- ifelse(name %in% vars.normal, "Mean ± SD - NMISS", "Median (Q1 - Q3) - NMISS") cat <- c("FREQ (PCT%) - NMISS") y <- render.default(x, name, render.continuous=cont, render.categorical=cat, ...) ynoNA <- render.default(x, name, render.continuous=cont, ...)

if (name %in% var_dep) {
    z <- read.table(text=paste(x, collapse="\n"), sep=":")
    x <- z[[2]][z[[1]] == 1 & !is.na(z[1]]
render.default(x, name, ...)
   } 

else if (is.logical(x)) { y[2] }
else { ynoNA } }

Thank you very much

benjaminrich commented 2 years ago

A small tweak will do it. Here is a complete example (I reformatted your code a bit):

library(boot)
library(table1)

set.seed(789)

melanoma2 <- melanoma

melanoma2$sex.binary <- with(melanoma2, sex == 1)
melanoma2$sex.factor <- with(melanoma2, factor(sex, labels=c("Female", "Male")))

# Add some missing values
melanoma2$age[sample(1:nrow(melanoma2), 11)] <- NA
melanoma2$ulcer[sample(1:nrow(melanoma2), 3)] <- NA
melanoma2$thickness[sample(1:nrow(melanoma2), 5)] <- NA
melanoma2$sex.factor[sample(1:nrow(melanoma2), 2)] <- NA

# Combine ulcer and thickness
melanoma2$ulcer.thickness <- with(melanoma2, paste(ulcer, thickness, sep=":"))

vars.normal <- c("ulcer.thickness")
vars.dep    <- c("ulcer.thickness")

rndr <- function(x, name, ...) {

    if (name %in% vars.dep) {
        z <- read.table(text=paste(x, collapse="\n"), sep=":")
        x <- z[[2]][!is.na(z[1]) & z[[1]] == 1]
    } 

    if (is.numeric(x)) {
        cont <- ifelse(name %in% vars.normal, "Mean ± SD - NMISS", "Median (Q1 - Q3) - NMISS")
        render.default(x, name, render.continuous=cont, render.missing=NULL, ...)
    } else if (is.logical(x)) {
        y <- render.default(x, name, render.categorical="FREQ (PCT%) - NMISS", render.missing=NULL, ...)
        y[2]
    }
    else {
        render.default(x, name, ...)
    }
}

label(melanoma2$age)             <- "Age (years), - Median (Q1 - Q3) #missing"
label(melanoma2$ulcer.thickness) <- "Thickness, only for patients with ulceration (mm), Mean ± SD - #missing"
label(melanoma2$sex.binary)      <- "Male, n (%) - #missing"
label(melanoma2$sex.factor)      <- "Sex"

table1(~ age + ulcer.thickness + sex.binary + sex.factor | factor(sex), data=melanoma2, render=rndr)

image

EDIT: fixed a problem with the display of missing values.

PierreH94 commented 2 years ago

It works perfectly well, thank you for your hard work ! The result is amazing !

benjaminrich commented 2 years ago

Great!