Merck / r2rtf

Easily Create Production-Ready Rich Text Format (RTF) Table and Figure
https://merck.github.io/r2rtf
GNU General Public License v3.0
76 stars 19 forks source link

Cannot handle Chinese characters #219

Closed AlexZHENGH closed 2 months ago

AlexZHENGH commented 2 months ago

Describe the bug When exported in RTF format, Chinese characters are not encoded correctly.

Expected behavior The Chinese characters are encoded correctly.

To reproduce

library(r2rtf)
library(dplyr)
library(tidyr)

data(r2rtf_adae)
ae_t1 <- r2rtf_adae %>%
  group_by(TRTA) %>%
  mutate(n_subj = n_distinct(USUBJID)) %>%
  group_by(TRTA, AEDECOD) %>%
  summarise(
    n_ae = n_distinct(USUBJID),
    pct = round(n_ae / unique(n_subj) * 100, 2)
  ) %>%
  dplyr::filter(n_ae > 5) %>%
  # only show AE terms with at least 5 subjects in one treatment group.
  pivot_longer(cols = c(n_ae, pct), names_to = "var", values_to = "value") %>%
  unite(temp, TRTA, var) %>%
  pivot_wider(names_from = temp, values_from = value, values_fill = 0)

ae_tbl <- ae_t1 %>%
  rtf_title(
    "Analysis of Subjects With Specific Adverse Events",
    c(
      "(Incidence > 5 Subjects in One or More Treatment Groups)",
      "ASaT"
    )
  ) %>%
  rtf_colheader(" | 安慰剂 | Drug High Dose | Drug Low Dose",
                col_rel_width = c(4, rep(2, 3))
  ) %>%
  rtf_colheader(" | n | (%) | n | (%) | n | (%)",
                col_rel_width = c(4, rep(1, 6)),
                border_top = c("", rep("single", 6)),
                border_left = c("single", rep(c("single", ""), 3))
  ) %>%
  rtf_body(
    col_rel_width = c(4, rep(1, 6)),
    text_justification = c("l", rep("c", 6)),
    border_left = c("single", rep(c("single", ""), 3))
  ) %>%
  rtf_footnote(c("{^\\dagger}This is footnote 1", "This is footnote 2"), ) %>%
  rtf_source("Source: xxx")

# Output .rtf file
ae_tbl %>%
  rtf_encode() %>%
  write_rtf("ae_example.rtf")

Screenshots

image

Session info If the bug is related to specific package versions or version combinations, paste the session info into the code block below.

Session Info ```R > sessionInfo() R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS 14.4.1 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods [7] base other attached packages: [1] tidyr_1.2.1 dplyr_1.0.10 r2rtf_1.1.1 loaded via a namespace (and not attached): [1] Rcpp_1.0.9 lattice_0.20-45 prettyunits_1.1.1 [4] ps_1.7.2 digest_0.6.31 utf8_1.2.2 [7] rxode2random_2.0.11 R6_2.5.1 backports_1.4.1 [10] sys_3.4.1 stats4_4.2.2 evaluate_0.19 [13] ggplot2_3.4.0 pillar_1.8.1 rlang_1.0.6 [16] rstudioapi_0.14 data.table_1.14.8 callr_3.7.3 [19] Matrix_1.5-1 checkmate_2.1.0 rmarkdown_2.19 [22] qs_0.25.4 dparser_1.3.1-10 PreciseSums_0.5 [25] loo_2.5.1 munsell_0.5.0 symengine_0.1.6 [28] compiler_4.2.2 xfun_0.35 rstan_2.21.7 [31] pkgconfig_2.0.3 pkgbuild_1.4.0 htmltools_0.5.4 [34] tidyselect_1.2.0 mrgsolve_1.0.6 tibble_3.1.8 [37] gridExtra_2.3 rxode2parse_2.0.15 codetools_0.2-18 [40] matrixStats_0.63.0 fansi_1.0.3 nlmixr2data_2.0.7 [43] withr_2.5.0 crayon_1.5.2 grid_4.2.2 [46] nlme_3.1-160 gtable_0.3.1 lifecycle_1.0.3 [49] magrittr_2.0.3 units_0.8-1 StanHeaders_2.21.0-7 [52] scales_1.2.1 RcppParallel_5.1.5 stringi_1.7.8 [55] cli_3.5.0 cachem_1.0.6 n1qn1_6.0.1-11 [58] rxode2_2.0.12 ellipsis_0.3.2 generics_0.1.3 [61] vctrs_0.5.1 stringfish_0.15.7 lotri_0.4.3 [64] RApiSerialize_0.1.2 tools_4.2.2 glue_1.6.2 [67] purrr_0.3.5 pkgload_1.3.2 yaml_2.3.6 [70] processx_3.8.0 parallel_4.2.2 rxode2et_2.0.10 [73] fastmap_1.1.0 inline_0.3.19 colorspace_2.0-3 [76] lbfgsb3c_2020-3.2 nlmixr2est_2.1.4 memoise_2.0.1 [79] knitr_1.41 ```

Additional context Add any other useful context about the problem here.

nanxstats commented 2 months ago

Please see https://github.com/Merck/r2rtf/issues/210

AlexZHENGH commented 2 months ago

Please see #210

@nanxstats Thanks for the helpful suggestion! I wonder how to deal with the Chinese characters in the original dataset. For example:

data(r2rtf_adae)
r2rtf_adae['AEDECOD'] = '中文'
nanxstats commented 2 months ago

I guess they can be encoded with r2rtf::utf8Tortf() first, too?

data(r2rtf_adae)
r2rtf_adae[r2rtf_adae$AEDECOD == "HEADACHE", "AEDECOD"] <- utf8Tortf("头痛")
AlexZHENGH commented 2 months ago

Many thanks @nanxstats!