Formatting issue with decimal places for certain categories using gtsummary::tbl_custom_summary

myamortor commented 1 year ago

Hello everyone,

I'm encountering an issue using the tbl_custom_summary function from the gtsummary package. I'm working with categorical variables and my aim is to display the incidence of y in percentage for each level of each variable. I use the stat_fns argument with the proportion_summary function to obtain these proportions. My goal in terms of formatting is to display only a single digit after the decimal point. For this, I'm using the digits argument with the style_percent function.

However, a strange behavior occurs: although I try to force the display to a single digit after the decimal, one category from the x3 variable and two categories from the x7 variable display two digits after the decimal. All my variables, including the output y, are categorical.

Here's my code for illustration:

Frequency <- my_data %>% tbl_custom_summary( include = c("x1", "x2", "x3", "x4", "x5", "x6", "x7"), stat_fns = ~ proportion_summary("y", "1"), statistic = ~"{prop} ({n}/{N})", digits = ~ list( function(x) { style_percent(x, symbol = TRUE, digits = 1)} ,0,0), overall_row = TRUE, overall_row_last = TRUE, label = list(x1 ~ "Label 1", x2 ~ "Label 2", x3 ~ "Label 3", x4 ~ "Label 4", x5 ~ "Label 5", x6 ~ "Label 6", x7 ~ "Label 7") ) %>% bold_labels() %>% modify_footnote( update = all_stat_cols() ~ "Output label" ) Has anyone encountered this behavior before or can help me understand what's happening?

Thank you in advance for your help and feedback.

larmarange commented 1 year ago

Hi, could you provide a minimal and reproductive example? cf. https://reprex.tidyverse.org/

myamortor commented 1 year ago

rm(list = ls())
cat("\f")


library(readxl) # excel import
library(dplyr) # data manipulation
#> 
#> Attachement du package : 'dplyr'
#> Les objets suivants sont masqués depuis 'package:stats':
#> 
#>     filter, lag
#> Les objets suivants sont masqués depuis 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2) # data visualization
library(gtsummary)
#> #BlackLivesMatter
library(webshot2)

root <- "C:/Users/ramzi/Desktop/IVI/Projets IVI/ECHO-N/HEV risk factor/Analysis1"
org <- file.path(root, "Data")
prg <- file.path(root, "Programs")
out <- file.path(root, "Results")

# Import datasets
dataset <- read_excel(file.path(prg, "dataset.xlsx"))

# Convert it to a factor
dataset$Var8<- as.factor(dataset$Var8)
dataset$Var1 <- as.factor(dataset$Var1)
dataset$Var2 <- as.factor(dataset$Var2)
dataset$Var3<-as.factor(dataset$Var3)
dataset$Var4<- as.factor(dataset$Var4)
dataset$Var5<-as.factor(dataset$Var5)
dataset$Var6<- as.factor(dataset$Var6)
dataset$Var7<- as.factor(dataset$Var7)

dataset %>%
  tbl_custom_summary(
    include = c("Var1", "Var2", "Var3",  
                "Var4", "Var5", "Var6", "Var7"),
    # Use the new denom variable as the denominator
    stat_fns = ~ proportion_summary("Var8", "1"),
    statistic = ~"{prop} ({n}/{N})",
    digits = ~ list(
      function(x) {
        style_percent(x, symbol = TRUE, digits = 1)}
      ,0,0),
    overall_row = TRUE,
    overall_row_last = TRUE
  ) %>%
  bold_labels() %>%
  modify_footnote(
    update = all_stat_cols() ~ "Prop % (n/N)")

Characteristic	N = 3,703¹
Var1
Level1.1	6.04% (11/182)
Level1.2	3.34% (39/1,169)
Level1.3	21.3% (248/1,164)
Level1.4	36.7% (304/829)
Level1.5	47.4% (170/359)
Var2
Level2.1	18.6% (374/2,010)
Level2.2	23.5% (398/1,693)
Var3
Level3.1	20.6% (512/2,484)
Level3.2	21.3% (260/1,219)
Var4
Level4.1	14.2% (281/1,974)
Level4.2	24.6% (52/211)
Level4.3	28.9% (439/1,518)
Var5
Level5.1	20.7% (664/3,203)
Level5.2	21.6% (108/500)
Var6
Level6.1	36.2% (77/213)
Level6.2	19.9% (695/3,490)
Var7
Level7.1	26.9% (59/219)
Level7.2	13.6% (100/738)
Level7.3	9.21% (62/673)
Level7.4	37.4% (280/748)
Level7.5	15.8% (52/330)
Level7.6	20.1% (138/685)
Level7.7	26.1% (81/310)
Overall	20.8% (772/3,703)
¹ Prop % (n/N)

^{Created on 2023-09-05 with reprex v2.0.2}

myamortor commented 1 year ago

You can see the issue for Var1 : Level 1.1 and 1.2 and for Var7 : Level7.3 Thank you.

larmarange commented 1 year ago

Just a quick question, do you have the same issue if you are using as a formatter scales::label_percent(accuracy = 0.1) instead of using function(x) {style_percent(x, symbol = TRUE, digits = 1)}?

cf. https://scales.r-lib.org/reference/label_percent.html

myamortor commented 1 year ago

No It works perfectly ! Thank you !

  rm(list = ls())
cat("\f")


library(readxl) # excel import
library(dplyr) # data manipulation
#> 
#> Attachement du package : 'dplyr'
#> Les objets suivants sont masqués depuis 'package:stats':
#> 
#>     filter, lag
#> Les objets suivants sont masqués depuis 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2) # data visualization
library(gtsummary)
library(webshot2)
library(scales)

# Import datasets
dataset <- read_excel("C:/CodeR/dataset.xlsx")

# Convert it to a factor
dataset$Var8<- as.factor(dataset$Var8)
dataset$Var1 <- as.factor(dataset$Var1)
dataset$Var2 <- as.factor(dataset$Var2)
dataset$Var3<-as.factor(dataset$Var3)
dataset$Var4<- as.factor(dataset$Var4)
dataset$Var5<-as.factor(dataset$Var5)
dataset$Var6<- as.factor(dataset$Var6)
dataset$Var7<- as.factor(dataset$Var7)

dataset %>%
  tbl_custom_summary(
    include = c("Var1", "Var2", "Var3",  
                "Var4", "Var5", "Var6", "Var7"),
    # Use the new denom variable as the denominator
    stat_fns = ~ proportion_summary("Var8", "1"),
    statistic = ~"{prop} ({n}/{N})",
    digits = ~ list(
      function(x) {
        scales::label_percent(accuracy = 0.1, suffix = "")(x)}
      ,0,0)
    ,
    overall_row = TRUE,
    overall_row_last = TRUE
  ) %>%
  bold_labels() %>%
  modify_footnote(
    update = all_stat_cols() ~ "Prop % (n/N)")

Characteristic	N = 3,703¹
Var1
Level1.1	6.0 (11/182)
Level1.2	3.3 (39/1,169)
Level1.3	21.3 (248/1,164)
Level1.4	36.7 (304/829)
Level1.5	47.4 (170/359)
Var2
Level2.1	18.6 (374/2,010)
Level2.2	23.5 (398/1,693)
Var3
Level3.1	20.6 (512/2,484)
Level3.2	21.3 (260/1,219)
Var4
Level4.1	14.2 (281/1,974)
Level4.2	24.6 (52/211)
Level4.3	28.9 (439/1,518)
Var5
Level5.1	20.7 (664/3,203)
Level5.2	21.6 (108/500)
Var6
Level6.1	36.2 (77/213)
Level6.2	19.9 (695/3,490)
Var7
Level7.1	26.9 (59/219)
Level7.2	13.6 (100/738)
Level7.3	9.2 (62/673)
Level7.4	37.4 (280/748)
Level7.5	15.8 (52/330)
Level7.6	20.1 (138/685)
Level7.7	26.1 (81/310)
Overall	20.8 (772/3,703)
¹ Prop % (n/N)

^{Created on 2023-09-05 with reprex v2.0.2}

ddsjoberg commented 1 year ago

seems like this is solved. please reopen if not fully addressed!

ddsjoberg / gtsummary

Formatting issue with decimal places for certain categories using gtsummary::tbl_custom_summary #1550