ddsjoberg / gtsummary

Presentation-Ready Data Summary and Analytic Result Tables
http://www.danieldsjoberg.com/gtsummary
Other
1.04k stars 114 forks source link

Formatting issue with decimal places for certain categories using gtsummary::tbl_custom_summary #1550

Closed myamortor closed 1 year ago

myamortor commented 1 year ago

Hello everyone,

I'm encountering an issue using the tbl_custom_summary function from the gtsummary package. I'm working with categorical variables and my aim is to display the incidence of y in percentage for each level of each variable. I use the stat_fns argument with the proportion_summary function to obtain these proportions. My goal in terms of formatting is to display only a single digit after the decimal point. For this, I'm using the digits argument with the style_percent function.

However, a strange behavior occurs: although I try to force the display to a single digit after the decimal, one category from the x3 variable and two categories from the x7 variable display two digits after the decimal. All my variables, including the output y, are categorical.

Here's my code for illustration:

Frequency <- my_data %>% tbl_custom_summary( include = c("x1", "x2", "x3", "x4", "x5", "x6", "x7"), stat_fns = ~ proportion_summary("y", "1"), statistic = ~"{prop} ({n}/{N})", digits = ~ list( function(x) { style_percent(x, symbol = TRUE, digits = 1)} ,0,0), overall_row = TRUE, overall_row_last = TRUE, label = list(x1 ~ "Label 1", x2 ~ "Label 2", x3 ~ "Label 3", x4 ~ "Label 4", x5 ~ "Label 5", x6 ~ "Label 6", x7 ~ "Label 7") ) %>% bold_labels() %>% modify_footnote( update = all_stat_cols() ~ "Output label" ) Has anyone encountered this behavior before or can help me understand what's happening?

Thank you in advance for your help and feedback.

larmarange commented 1 year ago

Hi, could you provide a minimal and reproductive example? cf. https://reprex.tidyverse.org/

myamortor commented 1 year ago
rm(list = ls())
cat("\f")


library(readxl) # excel import
library(dplyr) # data manipulation
#> 
#> Attachement du package : 'dplyr'
#> Les objets suivants sont masqués depuis 'package:stats':
#> 
#>     filter, lag
#> Les objets suivants sont masqués depuis 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2) # data visualization
library(gtsummary)
#> #BlackLivesMatter
library(webshot2)

root <- "C:/Users/ramzi/Desktop/IVI/Projets IVI/ECHO-N/HEV risk factor/Analysis1"
org <- file.path(root, "Data")
prg <- file.path(root, "Programs")
out <- file.path(root, "Results")

# Import datasets
dataset <- read_excel(file.path(prg, "dataset.xlsx"))

# Convert it to a factor
dataset$Var8<- as.factor(dataset$Var8)
dataset$Var1 <- as.factor(dataset$Var1)
dataset$Var2 <- as.factor(dataset$Var2)
dataset$Var3<-as.factor(dataset$Var3)
dataset$Var4<- as.factor(dataset$Var4)
dataset$Var5<-as.factor(dataset$Var5)
dataset$Var6<- as.factor(dataset$Var6)
dataset$Var7<- as.factor(dataset$Var7)

dataset %>%
  tbl_custom_summary(
    include = c("Var1", "Var2", "Var3",  
                "Var4", "Var5", "Var6", "Var7"),
    # Use the new denom variable as the denominator
    stat_fns = ~ proportion_summary("Var8", "1"),
    statistic = ~"{prop} ({n}/{N})",
    digits = ~ list(
      function(x) {
        style_percent(x, symbol = TRUE, digits = 1)}
      ,0,0),
    overall_row = TRUE,
    overall_row_last = TRUE
  ) %>%
  bold_labels() %>%
  modify_footnote(
    update = all_stat_cols() ~ "Prop % (n/N)")
Characteristic N = 3,7031
Var1
    Level1.1 6.04% (11/182)
    Level1.2 3.34% (39/1,169)
    Level1.3 21.3% (248/1,164)
    Level1.4 36.7% (304/829)
    Level1.5 47.4% (170/359)
Var2
    Level2.1 18.6% (374/2,010)
    Level2.2 23.5% (398/1,693)
Var3
    Level3.1 20.6% (512/2,484)
    Level3.2 21.3% (260/1,219)
Var4
    Level4.1 14.2% (281/1,974)
    Level4.2 24.6% (52/211)
    Level4.3 28.9% (439/1,518)
Var5
    Level5.1 20.7% (664/3,203)
    Level5.2 21.6% (108/500)
Var6
    Level6.1 36.2% (77/213)
    Level6.2 19.9% (695/3,490)
Var7
    Level7.1 26.9% (59/219)
    Level7.2 13.6% (100/738)
    Level7.3 9.21% (62/673)
    Level7.4 37.4% (280/748)
    Level7.5 15.8% (52/330)
    Level7.6 20.1% (138/685)
    Level7.7 26.1% (81/310)
Overall 20.8% (772/3,703)
1 Prop % (n/N)

Created on 2023-09-05 with reprex v2.0.2

myamortor commented 1 year ago

You can see the issue for Var1 : Level 1.1 and 1.2 and for Var7 : Level7.3 Thank you.

larmarange commented 1 year ago

Just a quick question, do you have the same issue if you are using as a formatter scales::label_percent(accuracy = 0.1) instead of using function(x) {style_percent(x, symbol = TRUE, digits = 1)}?

cf. https://scales.r-lib.org/reference/label_percent.html

myamortor commented 1 year ago

No It works perfectly ! Thank you !

  rm(list = ls())
cat("\f")


library(readxl) # excel import
library(dplyr) # data manipulation
#> 
#> Attachement du package : 'dplyr'
#> Les objets suivants sont masqués depuis 'package:stats':
#> 
#>     filter, lag
#> Les objets suivants sont masqués depuis 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2) # data visualization
library(gtsummary)
library(webshot2)
library(scales)

# Import datasets
dataset <- read_excel("C:/CodeR/dataset.xlsx")

# Convert it to a factor
dataset$Var8<- as.factor(dataset$Var8)
dataset$Var1 <- as.factor(dataset$Var1)
dataset$Var2 <- as.factor(dataset$Var2)
dataset$Var3<-as.factor(dataset$Var3)
dataset$Var4<- as.factor(dataset$Var4)
dataset$Var5<-as.factor(dataset$Var5)
dataset$Var6<- as.factor(dataset$Var6)
dataset$Var7<- as.factor(dataset$Var7)

dataset %>%
  tbl_custom_summary(
    include = c("Var1", "Var2", "Var3",  
                "Var4", "Var5", "Var6", "Var7"),
    # Use the new denom variable as the denominator
    stat_fns = ~ proportion_summary("Var8", "1"),
    statistic = ~"{prop} ({n}/{N})",
    digits = ~ list(
      function(x) {
        scales::label_percent(accuracy = 0.1, suffix = "")(x)}
      ,0,0)
    ,
    overall_row = TRUE,
    overall_row_last = TRUE
  ) %>%
  bold_labels() %>%
  modify_footnote(
    update = all_stat_cols() ~ "Prop % (n/N)")
Characteristic N = 3,7031
Var1
    Level1.1 6.0 (11/182)
    Level1.2 3.3 (39/1,169)
    Level1.3 21.3 (248/1,164)
    Level1.4 36.7 (304/829)
    Level1.5 47.4 (170/359)
Var2
    Level2.1 18.6 (374/2,010)
    Level2.2 23.5 (398/1,693)
Var3
    Level3.1 20.6 (512/2,484)
    Level3.2 21.3 (260/1,219)
Var4
    Level4.1 14.2 (281/1,974)
    Level4.2 24.6 (52/211)
    Level4.3 28.9 (439/1,518)
Var5
    Level5.1 20.7 (664/3,203)
    Level5.2 21.6 (108/500)
Var6
    Level6.1 36.2 (77/213)
    Level6.2 19.9 (695/3,490)
Var7
    Level7.1 26.9 (59/219)
    Level7.2 13.6 (100/738)
    Level7.3 9.2 (62/673)
    Level7.4 37.4 (280/748)
    Level7.5 15.8 (52/330)
    Level7.6 20.1 (138/685)
    Level7.7 26.1 (81/310)
Overall 20.8 (772/3,703)
1 Prop % (n/N)

Created on 2023-09-05 with reprex v2.0.2

ddsjoberg commented 1 year ago

seems like this is solved. please reopen if not fully addressed!