ddsjoberg / gtsummary

Presentation-Ready Data Summary and Analytic Result Tables
http://www.danieldsjoberg.com/gtsummary
Other
1.03k stars 113 forks source link

tbl_summary(digits=) update #458

Closed ddsjoberg closed 4 years ago

ddsjoberg commented 4 years ago

I think there is a better way to for this argument to work.

At the moment, the argument only controls continuous variables. If the continuous variable is summarized with three statistics (e.g. median, p25, and p75), users pass an integer to indicate the number of decimal places to round. They can also pass three different digits, to round the variables to three different levels, e.g. digits = age ~ c(0, 1, 2).

I want to expand it to accept value for categorical variables as well, but I am not sure the best way.

One possible method is to allow users to pass a function in the digits argument. For example, if the statistic presented is just the percentage (statistic = all_categorical() ~ "{p}"), and they wanted all percentages formatted with the style_sigfig() function, they could pass digits = all_categorical() ~ style_sigfig.

Users would still be able to pass individual digits for rounding, and we could treat the digits as shortcuts for the sprintf() function, e.g. function(x) sprintf(x, fmt='%#.1f') for rounding a number to one decimal place.

An annoyance could occur for categorical variables, when the user wants to change the just the rounding for the percentage and not the other statistics: under the current system, they'd need to specify every statistic. One way around this is to accept a named list, e.g. digits = all_categorical() ~ list(p ~ style_sigfig); this would only change the percent column. This would also allow users to change N and n formatting as well.

larmarange commented 4 years ago

It would be great to have an easy way to control the number of digits when displaying percentage.

ddsjoberg commented 4 years ago

Hi @larmarange ! Thanks for bringing this issue attention. We haven't had the time to agree on the best API for allowing this at the moment within the tbl_summary() function. But here is a solutonyou can use now if you'd like:

In the dev version of gtsummary, we've implemented themes (http://www.danieldsjoberg.com/gtsummary/dev/articles/themes.html#writing-themes-1). There is a theme element that controls the way the percentages are formatted in tbl_summary() called "tbl_summary-fn:percent_fun".

```r
# theme to round percentages to one decimal place
set_gtsummary_theme(list(
  `tbl_summary-fn:percent_fun` = function(x) sprintf("%.1f", x * 100)
))
tbl_summary(trial)
```
larmarange commented 4 years ago

Thanks for the feedback

larmarange commented 4 years ago

Just a quick comment, but the % symbol should not be added after applying tbl_summary-fn:percent_fun but should be within the function.

Regards

ddsjoberg commented 4 years ago

Thanks @larmarange !

The default statistic for categorical variables is "{n} ({p}%)", where the percent symbol is added in the string (i.e. the rounding of the percent and adding the percent symbol are separate).

If the function that rounds the p-value also appends the percent symbol, you'll want to remove it from the statistic. The code below should get you what you're looking for.

set_gtsummary_theme(list(
  "tbl_summary-fn:percent_fun" = scales::label_percent(accuracy = 0.1),
  "tbl_summary-str:categorical_stat" = "{n} ({p})"
))

tbl_summary(trial)
larmarange commented 4 years ago

Thanks for the clarification. :-)

Hope that all these tricks will be added to the theme vignette.

Thanks again for being so reactive.

ddsjoberg commented 4 years ago

Settled on the follow:

The syntax is the same for categorical variables as for continuous variables. When the statistics presented are "{n} ({p}%)", and the user wants percentages displayed to two decimal places they'd digits = list(all_categorical() ~ c(0, 2)).

If users want more sophisticated functions to format, they can utilize the themes to change the default functions.

pskselva commented 1 year ago

I think there is a better way to for this argument to work.

At the moment, the argument only controls continuous variables. If the continuous variable is summarized with three statistics (e.g. median, p25, and p75), users pass an integer to indicate the number of decimal places to round. They can also pass three different digits, to round the variables to three different levels, e.g. digits = age ~ c(0, 1, 2).

I want to expand it to accept value for categorical variables as well, but I am not sure the best way.

One possible method is to allow users to pass a function in the digits argument. For example, if the statistic presented is just the percentage (statistic = all_categorical() ~ "{p}"), and they wanted all percentages formatted with the style_sigfig() function, they could pass digits = all_categorical() ~ style_sigfig.

Users would still be able to pass individual digits for rounding, and we could treat the digits as shortcuts for the sprintf() function, e.g. function(x) sprintf(x, fmt='%#.1f') for rounding a number to one decimal place.

An annoyance could occur for categorical variables, when the user wants to change the just the rounding for the percentage and not the other statistics: under the current system, they'd need to specify every statistic. One way around this is to accept a named list, e.g. digits = all_categorical() ~ list(p ~ style_sigfig); this would only change the percent column. This would also allow users to change N and n formatting as well.

Can we create a summary table without the percentage symbol in the result?