Closed Beduiz closed 2 years ago
You have to know which variables are assumed to be normally distributed (or do you mean that you want it to be detected automatically?).
Here is a small example. Assume that there are 5 variables, A, B, C, D and E, that that A and C are normally distributed while the others are not. In the custom render function, check the name
against the list of known normally distributed variables, and then adapt the function to display different stats accordingly.
library(table1)
set.seed(123)
dat <- data.frame(
A = rnorm(300, 50, 10),
B = rgamma(300, 0.7, 3),
C = rnorm(300, 1000, 99),
D = runif(300, 20, 80),
E = rbeta(300, 0.1, 0.2) + 0.5)
# These are the variables that have a normal distribution (known a priori)
vars.normal <- c("A", "C")
rndr <- function(x, name, ...) {
cont <- ifelse(name %in% vars.normal, "Mean (SD)", "Median (Q1 - Q3)")
render.default(x, name, render.continuous=c("", cont), ...)
}
table1(~ A + B + C + D + E, data=dat, render=rndr)
Hi Benjamin,
Thank you so much for your reply. Yes, you are correct that I know which variables i want to test in which way a priori.
If I use the above code, can I then also still specify as I've done previously the "render.categorical=my.render.cat", so that categorical variables are tested in a third way?
In other words, I would like to render for example variable A & B as continuous with Mean (SD), variable C & D as continuous with Median (Q1-Q3) and variable E as categorical with N (%).
Best regards Eric
Hi Eric,
Yes, that will work, as you can readily verify:
library(table1)
set.seed(123)
dat <- data.frame(
A = rnorm(300, 50, 10),
B = rnorm(300, 1000, 99),
C = rgamma(300, 0.7, 3),
D = runif(300, 20, 80),
E = sample(c("Class 1", "Class 2"), 300, replace=T))
# These are the variables that have a normal distribution (known a priori)
vars.normal <- c("A", "B")
rndr <- function(x, name, ...) {
cont <- ifelse(name %in% vars.normal, "Mean (SD)", "Median (Q1 - Q3)")
render.default(x, name, render.continuous=c("", cont), ...)
}
table1(~ A + B + C + D + E, data=dat, render=rndr)
Note that you don't need to specify render.categorical
unless you want to do something different than the default.
Hi Benjamin,
Big thank you, it worked for me also. Perfect!
One last question that i think might be beyond the scope of your code though, so apologies if it is: How can I also add the selection with vars.normal to the p-value-column (using extra.col=list(P-value
=pvalue))? I'm using this code, but i don't know how i can use vars.normal for the One-way ANOVA test:
pvalue <- function(x, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- factor(rep(1:length(x), times=sapply(x, length)))
# One-way ANOVA for continuous variables with normal distribution (ie those assigned "vars.normal" above)
if (is.numeric(y)) {
p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
# Jonckheere-Terpstra test for continuous variables with skewed distribution
} if (is.numeric(y)) {
p <- jonckheere.test(y, g, alternative="two.sided")$p.value
} else {
# Chi-square test for categorical variables
p <- chisq.test(table(y, g))$p.value
}
# Format the p-value, using an HTML entity for the less-than sign.
c(sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}
Ie I want to use vars.normal for the One-way ANOVA-test above.
Kind regards Eric
Hi Eric,
You can use the same approach. Note that jonckheere.test
requires that g
be an ordered factor. Here is a complete example:
library(table1)
library(clinfun)
set.seed(123)
dat <- data.frame(
A = rnorm(300, 50, 10),
B = rnorm(300, 1000, 99),
C = rgamma(300, 0.7, 3),
D = runif(300, 20, 80),
E = sample(c("Class 1", "Class 2"), 300, replace=T),
F = sample(c("Strat 1", "Strat 2", "Strat 3"), 300, replace=T))
# These are the variables that have a normal distribution (known a priori)
vars.normal <- c("A", "B")
rndr <- function(x, name, ...) {
cont <- ifelse(name %in% vars.normal, "Mean (SD)", "Median (Q1 - Q3)")
render.default(x, name, render.continuous=c("", cont), ...)
}
pvalue <- function(x, name, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- ordered(rep(1:length(x), times=sapply(x, length)))
if (is.numeric(y) && (name %in% vars.normal)) {
# One-way ANOVA for continuous variables with normal distribution (ie those assigned "vars.normal" above)
p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
} else if (is.numeric(y)) {
# Jonckheere-Terpstra test for continuous variables with skewed distribution
p <- clinfun::jonckheere.test(y, g, alternative="two.sided")$p.value
} else {
# Chi-square test for categorical variables
p <- chisq.test(table(y, g))$p.value
}
# Format the p-value, using an HTML entity for the less-than sign.
c(sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}
table1(~ A + B + C + D + E | F, data=dat, render=rndr, extra.col=list(`P-value`=pvalue))
Thank you that worked perfectly!
Sorry for having another question, but I'm also trying to present some variables (for example "Sex" with the values "male" and "female") with only one of the values (for example "Sex, male") to make the tables more compact. I read your reply to a similar question here: https://github.com/benjaminrich/table1/issues/48. It entailed coding those variables as logical and adding this code to the rndr-function:
rndr <- function(x, ...) {
y <- render.default(x, ...)
if (is.logical(x)) y[2] else y
}
However, when I do this, I (1) lose the p-value for that row, and (2) all the other variables get two rows with one for Mean (SD) and another for Median [Min, Max].
Finally, is there a way to contribute monetarily to the community work that you put into this package? Do you have a gofundme-page or similar?
Kind regards Eric
Hi Eric,
It is relatively easy to do all these things. I have updated the example to incorporate those elements (as far as I understand what you want):
library(table1)
library(mappings)
library(clinfun)
set.seed(123)
dat <- data.frame(
A = rnorm(300, 50, 10),
B = rnorm(300, 1000, 99),
C = rgamma(300, 0.7, 3),
age = runif(300, 20, 80),
sex = sample(1:2, 300, replace=T),
smoking = sample(1:3, 300, replace=T),
F = sample(c("Group 1", "Group 2", "Group 3"), 300, replace=T))
dat$is_male <- dat$sex == 1 # logical (assume 1 is for male)
m <- text2mapping("
1 | No
2 | Current
3 | Previous
")
dat$smoking <- m(dat$smoking)
# These are the variables that have a normal distribution (known a priori)
vars.normal <- c("A", "B")
rndr <- function(x, name, ...) {
cont <- ifelse(name %in% vars.normal, "Mean (SD)", "Median (Q1 - Q3)")
y <- render.default(x, name, render.continuous=cont, ...)
if (is.logical(x)) {
y[2]
} else if (is.factor(x)) {
y[names(y) != levels(x)[1]] # Exclude the first (reference) level
} else {
y
}
}
pvalue <- function(x, name, ...) {
# Construct vectors of data y, and groups (strata) g
y <- unlist(x)
g <- ordered(rep(1:length(x), times=sapply(x, length)))
if (is.numeric(y) && (name %in% vars.normal)) {
# One-way ANOVA for continuous variables with normal distribution (ie those assigned "vars.normal" above)
p <- summary(aov(y ~ g))[[1]][["Pr(>F)"]][1]
} else if (is.numeric(y)) {
# Jonckheere-Terpstra test for continuous variables with skewed distribution
p <- clinfun::jonckheere.test(y, g, alternative="two.sided")$p.value
} else {
# Chi-square test for categorical variables
p <- chisq.test(table(y, g))$p.value
}
# Format the p-value, using an HTML entity for the less-than sign.
c(sub("<", "<", format.pval(p, digits=3, eps=0.001)))
}
stats <- function(x, name, ...) {
y <- unlist(x)
if (is.numeric(y) && (name %in% vars.normal)) {
"Mean (SD)"
} else if (is.numeric(y)) {
"Median (Q1-Q3)"
} else {
"n (%)"
}
}
label(dat$age) <- "Age (years)"
label(dat$is_male) <- "Sex, male"
label(dat$smoking) <- "Smoking status"
table1(~ A + B + C + age + is_male + smoking | F, data=dat, render=rndr,
extra.col=list(` `=stats, `P-value`=pvalue), extra.col.pos=1, overall=F)
As for a monetary contribution, it's very kind of you to ask, but I'm not taking any at this time. This is my tiny way of giving back to the open source community, that I benefit from greatly. If you find this package useful, that makes me happy. I really appreciate the thought though, and take it as a compliment.
Well I'm very thankful in that case to your contributions. Indeed take my offer as a compliment. I'm very thankful that you are able to help me.
One more question though: I would prefer the stats function to be presented in the same column as the variable name. For example "Age (years), median (Q1-Q3)". Is that also possible?
Best regards Eric
Edit: in my first post i said i didnt get Jonckheere to work, but i got it working now by adjusting the smoking status variable :-)
Yes, like this:
label(dat$A) <- "A, mean (SD)"
label(dat$B) <- "B, mean (SD)"
label(dat$C) <- "C, median (Q1-Q3)"
label(dat$age) <- "Age (years), median (Q1-Q3)"
label(dat$is_male) <- "Sex, male"
label(dat$smoking) <- "Smoking status"
table1(~ A + B + C + age + is_male + smoking | F, data=dat, render=rndr,
extra.col=list(`P-value`=pvalue), overall=F)
(Note that you no longer need the stats()
function from the previous version. Also note, make sure you remove the "Overall" column, otherwise you need to modify the pvalue()
function.)
Hi,
I have a data set in which I would like to present continuous variables in different ways depending on wether they are normally or non-normally distributed. I would thus like to be able to use these two different render functions for continuous variables:
my.render.cont.median.quartlies <- function(x) { with(stats.apply.rounding(stats.default(x, ), digits = 2), c("", "Median (Q1-Q3)" = sprintf(paste("%s (",Q1,"- %s)"), MEDIAN,Q3)))}
my.render.cont.mean.sd <- function(x) { with(stats.apply.rounding(stats.default(x), digits=2), c("", "Mean (SD)"=sprintf("%s (± %s)", MEAN, SD))) }
However, I don't understand how I can apply this in the table1-function? Do you know of any solution?
Best regards Eric