benjaminrich / table1

79 stars 26 forks source link

Error: Stratification variable(s) should not contain missing values. #97

Open Beduiz opened 1 year ago

Beduiz commented 1 year ago

Dear Benjamin,

Since I last used table1 in the autumn, code that used to work does no longer with the following error presented:

Error in table1.formula(~variable1 + variable2 + variable3 + : 
  Stratification variable(s) should not contain missing values.

The startification variable I use has 3 levels (for example 1 car, 2 cars and 3 cars), and since I can't get P-values with more than 2 levels, I've created custom stratification variables (stratification_12 with only 1 car and 2 car, stratification 13 with only 1 car and 3 cars, and stratification 23 with only 2 cars and 3 cars), in which the level not included has been replaced with NA. This used to work, but now doesn't.

is there a workaround?

Kind regards

benjaminrich commented 1 year ago

I'm sorry for introducing this change that broke your code. It was done in response to feedback from another user (#80). But I obviously hadn't considered your use case and maybe I acted too hastily. In order to help you, I would need a concrete example that I can reproduce. Would it be possible to provide such an example using some simulated data?

Beduiz commented 1 year ago

Dear Benjamin,

Thank you for your help, I'm grateful. Here is a reproducible example:

`s1_vars.NA <- c() s1_vars.normal.test <- c() s1_vars.normal.view <- c()

s1_rndr <- function(x, name, ...) { cont <- ifelse(name %in% s1_vars.normal.view, "Mean (SD)", "Median (Q1-Q3)") y <- render.default(x, name, render.continuous=cont, ..., digits=1, digits.pct=1, round.integers=T, drop0trailing=F, rounding.fn = round_pad) # Max three digits for median (Q1-Q3) and zero decimals for percentages if (is.logical(x)) { y[2] } else if (is.factor(x)) { y # Don't exclude any level } else { y } } s1_pvalue <- function(x, name, ...) { x <- x[names(x) != "overall"] # Dont count the "overall" column y <- unlist(x) g <- ordered(rep(1:length(x), times=sapply(x, length))) if (name %in% s2_t1_vars.NA) { p <- write("NA") # Variables that should not be tested } else if (is.numeric(y) && (name %in% s1_vars.normal.test)) { p <- t.test(y ~ g, paired = F, alternative = c("two.sided"))$p.value # Two-samples t-test for normal continuous } else if (is.numeric(y)) { p <- wilcox.test(y ~ g, paired=F, alternative = c("two.sided"))$p.value # Mann-Whitney U-test for skewed continuous } else { p <- chisq.test(table(y, g), correct=F)$p.value # Chi-square test for categorical } c(sub("<", "<", format.pval(p, digits=3, nsmall=3, eps=0.001))) # Format p-value } s1_stats <- function(x, name, ...) { y <- unlist(x) if (is.numeric(y) && (name %in% s1_vars.normal.view)) { ", mean (SD)" } else if (is.numeric(y)) { ", median (Q1-Q3)" } else { ", n (%)" } }

table1(~ Sepal.Length + Petal.Length | Species, data=iris, render=s1_rndr, topclass = "Rtable1-zebra", render.missing=NULL, extra.col=list(`=s1_stats,P-value=s1_pvalue), extra.col.pos=1, overall=F)

Kind regards

benjaminrich commented 1 year ago

I'm looking at your example. I see that there are 3 strata (3 species), but I don't understand how you want the output to look. Are you trying to create 3 different p-values for each pairwise comparison?

Beduiz commented 1 year ago

Hi Benjamin.

My aim is to make a table with all the 3 strata, a total column, and a column with all the p-values (strata 1 vs 2, 1 vs 3 and 2 vs 3).

What i've done so far to achieve this is create 4 different tables that i later merge manually: In the first 3, I will run one strata to the other (i.e. strata 1 vs 2, 1 vs 3 and 2 vs 3) to get the p values. In the 4th table, I will only run for the descriptives and not ask for p-values.

However, it would definitely be great if it was possible to get all of this in one table. But if not, it would be great if i can again do as detailed above. I was able to do a work-around in whcih i create new darabases for each strata set of 2, ie one strata_1_vs_2-database for that table, strata_1_vs_3-database for that table and so forth. But it would be better if it worked as originally :-)

Kind regards

vonhyden commented 1 year ago

Dear Benjamin,

Im having the same issue as Beduiz .. Including the missing values in the descriptive analysis was actually useful to generate a fast overview of the data.

It would be great if you could do something about it.. :) Thank you !

best regards Nicolas

ILHaeu commented 5 months ago

Hi, also having this problem - is there a suggested alterative or workaround when your stratified variable has some missing values?

Streep commented 5 months ago

Hi all, I have the same problem. I was using renv to keep my packages the same, but now I decided to do an update and this issue broke all my tables.

Most other crosstable packages can deal with NAs. Maybe make it an option to set missing as a separate category?