benjaminrich / table1

79 stars 26 forks source link

How to present various continuous variables differently? #69

Closed Beduiz closed 2 years ago

Beduiz commented 2 years ago

Hi,

I have a data set in which I would like to present continuous variables in different ways depending on wether they are normally or non-normally distributed. I would thus like to be able to use these two different render functions for continuous variables:

my.render.cont.median.quartlies <- function(x) { with(stats.apply.rounding(stats.default(x, ), digits = 2), c("", "Median (Q1-Q3)" = sprintf(paste("%s (",Q1,"- %s)"), MEDIAN,Q3)))}

my.render.cont.mean.sd <- function(x) { with(stats.apply.rounding(stats.default(x), digits=2), c("", "Mean (SD)"=sprintf("%s (± %s)", MEAN, SD))) }

However, I don't understand how I can apply this in the table1-function? Do you know of any solution?

Best regards Eric

benjaminrich commented 2 years ago

You have to know which variables are assumed to be normally distributed (or do you mean that you want it to be detected automatically?).

Here is a small example. Assume that there are 5 variables, A, B, C, D and E, that that A and C are normally distributed while the others are not. In the custom render function, check the name against the list of known normally distributed variables, and then adapt the function to display different stats accordingly.

library(table1)

set.seed(123)

dat <- data.frame(
    A = rnorm(300, 50, 10),
    B = rgamma(300, 0.7, 3),
    C = rnorm(300, 1000, 99),
    D = runif(300, 20, 80),
    E = rbeta(300, 0.1, 0.2) + 0.5)

# These are the variables that have a normal distribution (known a priori)
vars.normal <- c("A", "C")

rndr <- function(x, name, ...) {
    cont <- ifelse(name %in% vars.normal, "Mean (SD)", "Median (Q1 - Q3)")
    render.default(x, name, render.continuous=c("", cont), ...)
}

table1(~ A + B + C + D + E, data=dat, render=rndr)

image

Beduiz commented 2 years ago

Hi Benjamin,

Thank you so much for your reply. Yes, you are correct that I know which variables i want to test in which way a priori.

If I use the above code, can I then also still specify as I've done previously the "render.categorical=my.render.cat", so that categorical variables are tested in a third way?

In other words, I would like to render for example variable A & B as continuous with Mean (SD), variable C & D as continuous with Median (Q1-Q3) and variable E as categorical with N (%).

Best regards Eric

benjaminrich commented 2 years ago

Hi Eric,

Yes, that will work, as you can readily verify:

library(table1)

set.seed(123)

dat <- data.frame(
    A = rnorm(300, 50, 10),
    B = rnorm(300, 1000, 99),
    C = rgamma(300, 0.7, 3),
    D = runif(300, 20, 80),
    E = sample(c("Class 1", "Class 2"), 300, replace=T))

# These are the variables that have a normal distribution (known a priori)
vars.normal <- c("A", "B")

rndr <- function(x, name, ...) {
    cont <- ifelse(name %in% vars.normal, "Mean (SD)", "Median (Q1 - Q3)")
    render.default(x, name, render.continuous=c("", cont), ...)
}

table1(~ A + B + C + D + E, data=dat, render=rndr)

image

Note that you don't need to specify render.categorical unless you want to do something different than the default.

Beduiz commented 2 years ago

Hi Benjamin,

Big thank you, it worked for me also. Perfect!

One last question that i think might be beyond the scope of your code though, so apologies if it is: How can I also add the selection with vars.normal to the p-value-column (using extra.col=list(P-value=pvalue))? I'm using this code, but i don't know how i can use vars.normal for the One-way ANOVA test:

pvalue <- function(x, ...) {
    # Construct vectors of data y, and groups (strata) g
    y <- unlist(x)
    g <- factor(rep(1:length(x), times=sapply(x, length)))
        #  One-way ANOVA for continuous variables with normal distribution (ie those assigned "vars.normal" above)
        if (is.numeric(y)) {
        p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
        #  Jonckheere-Terpstra test for continuous variables with skewed distribution
        } if (is.numeric(y)) {
        p <- jonckheere.test(y, g, alternative="two.sided")$p.value
        } else {
        # Chi-square test for categorical variables
        p <- chisq.test(table(y, g))$p.value
        }
    # Format the p-value, using an HTML entity for the less-than sign.
    c(sub("<", "&lt;", format.pval(p, digits=3, eps=0.001)))
}

Ie I want to use vars.normal for the One-way ANOVA-test above.

Kind regards Eric

benjaminrich commented 2 years ago

Hi Eric,

You can use the same approach. Note that jonckheere.test requires that g be an ordered factor. Here is a complete example:

library(table1)
library(clinfun)

set.seed(123)

dat <- data.frame(
    A = rnorm(300, 50, 10),
    B = rnorm(300, 1000, 99),
    C = rgamma(300, 0.7, 3),
    D = runif(300, 20, 80),
    E = sample(c("Class 1", "Class 2"), 300, replace=T),
    F = sample(c("Strat 1", "Strat 2", "Strat 3"), 300, replace=T))

# These are the variables that have a normal distribution (known a priori)
vars.normal <- c("A", "B")

rndr <- function(x, name, ...) {
    cont <- ifelse(name %in% vars.normal, "Mean (SD)", "Median (Q1 - Q3)")
    render.default(x, name, render.continuous=c("", cont), ...)
}

pvalue <- function(x, name, ...) {

    # Construct vectors of data y, and groups (strata) g
    y <- unlist(x)
    g <- ordered(rep(1:length(x), times=sapply(x, length)))

    if (is.numeric(y) && (name %in% vars.normal)) {
        #  One-way ANOVA for continuous variables with normal distribution (ie those assigned "vars.normal" above)
        p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
    } else if (is.numeric(y)) {
        #  Jonckheere-Terpstra test for continuous variables with skewed distribution
        p <- clinfun::jonckheere.test(y, g, alternative="two.sided")$p.value
    } else {
        # Chi-square test for categorical variables
        p <- chisq.test(table(y, g))$p.value
    }

    # Format the p-value, using an HTML entity for the less-than sign.
    c(sub("<", "&lt;", format.pval(p, digits=3, eps=0.001)))
}

table1(~ A + B + C + D + E | F, data=dat, render=rndr, extra.col=list(`P-value`=pvalue))

image

Beduiz commented 2 years ago

Thank you that worked perfectly!

Sorry for having another question, but I'm also trying to present some variables (for example "Sex" with the values "male" and "female") with only one of the values (for example "Sex, male") to make the tables more compact. I read your reply to a similar question here: https://github.com/benjaminrich/table1/issues/48. It entailed coding those variables as logical and adding this code to the rndr-function:

rndr <- function(x, ...) {
    y <- render.default(x, ...)
    if (is.logical(x)) y[2] else y
}

However, when I do this, I (1) lose the p-value for that row, and (2) all the other variables get two rows with one for Mean (SD) and another for Median [Min, Max].

  1. How can I combine the rndr-function for p-value with the one to show logical variables with only the "yes"-value?
  2. Furthermore, I would preferably be able to present some other categorical variables with selected values but more than just one, for example the variable "Smoking status" with values "No", "Current" and "Previous" to be presented with only "Current" and "Previous". Is there a way to achieve that as well?
  3. A third question is if it's possible to present variable names, p-values and values on the same row. Currently, I get "Age (years)" on the first row, then "p <0.001" on the second row and then "Median (Q1-Q3)" plus all the values on the third row. It would be preferable if they were all presented on the same row.

Finally, is there a way to contribute monetarily to the community work that you put into this package? Do you have a gofundme-page or similar?

Kind regards Eric

benjaminrich commented 2 years ago

Hi Eric,

It is relatively easy to do all these things. I have updated the example to incorporate those elements (as far as I understand what you want):

library(table1)
library(mappings)
library(clinfun)

set.seed(123)

dat <- data.frame(
    A = rnorm(300, 50, 10),
    B = rnorm(300, 1000, 99),
    C = rgamma(300, 0.7, 3),
    age = runif(300, 20, 80),
    sex = sample(1:2, 300, replace=T),
    smoking = sample(1:3, 300, replace=T),
    F = sample(c("Group 1", "Group 2", "Group 3"), 300, replace=T))

dat$is_male <- dat$sex == 1  # logical (assume 1 is for male)

m <- text2mapping("
1 | No
2 | Current
3 | Previous
")
dat$smoking <- m(dat$smoking)

# These are the variables that have a normal distribution (known a priori)
vars.normal <- c("A", "B")

rndr <- function(x, name, ...) {
    cont <- ifelse(name %in% vars.normal, "Mean (SD)", "Median (Q1 - Q3)")
    y <- render.default(x, name, render.continuous=cont, ...)
    if (is.logical(x)) {
        y[2]
    } else if (is.factor(x)) {
        y[names(y) != levels(x)[1]] # Exclude the first (reference) level
    } else {
        y
    }
}

pvalue <- function(x, name, ...) {

    # Construct vectors of data y, and groups (strata) g
    y <- unlist(x)
    g <- ordered(rep(1:length(x), times=sapply(x, length)))

    if (is.numeric(y) && (name %in% vars.normal)) {
        #  One-way ANOVA for continuous variables with normal distribution (ie those assigned "vars.normal" above)
        p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
    } else if (is.numeric(y)) {
        #  Jonckheere-Terpstra test for continuous variables with skewed distribution
        p <- clinfun::jonckheere.test(y, g, alternative="two.sided")$p.value
    } else {
        # Chi-square test for categorical variables
        p <- chisq.test(table(y, g))$p.value
    }

    # Format the p-value, using an HTML entity for the less-than sign.
    c(sub("<", "&lt;", format.pval(p, digits=3, eps=0.001)))
}

stats <- function(x, name, ...) {
    y <- unlist(x)
    if (is.numeric(y) && (name %in% vars.normal)) {
        "Mean (SD)"
    } else if (is.numeric(y)) {
        "Median (Q1-Q3)"
    } else {
        "n (%)"
    }
}

label(dat$age)     <- "Age (years)"
label(dat$is_male) <- "Sex, male"
label(dat$smoking) <- "Smoking status"

table1(~ A + B + C + age + is_male + smoking | F, data=dat, render=rndr,
    extra.col=list(` `=stats, `P-value`=pvalue), extra.col.pos=1, overall=F)

image

As for a monetary contribution, it's very kind of you to ask, but I'm not taking any at this time. This is my tiny way of giving back to the open source community, that I benefit from greatly. If you find this package useful, that makes me happy. I really appreciate the thought though, and take it as a compliment.

Beduiz commented 2 years ago

Well I'm very thankful in that case to your contributions. Indeed take my offer as a compliment. I'm very thankful that you are able to help me.

One more question though: I would prefer the stats function to be presented in the same column as the variable name. For example "Age (years), median (Q1-Q3)". Is that also possible?

Best regards Eric

Edit: in my first post i said i didnt get Jonckheere to work, but i got it working now by adjusting the smoking status variable :-)

benjaminrich commented 2 years ago

Yes, like this:

label(dat$A)       <- "A, mean (SD)"
label(dat$B)       <- "B, mean (SD)"
label(dat$C)       <- "C, median (Q1-Q3)"
label(dat$age)     <- "Age (years), median (Q1-Q3)"
label(dat$is_male) <- "Sex, male"
label(dat$smoking) <- "Smoking status"

table1(~ A + B + C + age + is_male + smoking | F, data=dat, render=rndr,
    extra.col=list(`P-value`=pvalue), overall=F)

image

(Note that you no longer need the stats() function from the previous version. Also note, make sure you remove the "Overall" column, otherwise you need to modify the pvalue() function.)