Merck / r2rtf

Easily Create Production-Ready Rich Text Format (RTF) Table and Figure
https://merck.github.io/r2rtf
GNU General Public License v3.0
76 stars 19 forks source link

Garbled code about Chinese character #210

Closed cglaze11 closed 8 months ago

cglaze11 commented 8 months ago

Thanks for your reading, and look forward to your reply.

Problems description

Encoding

When output .rtf with package r2rtf in Rstudio of UTF-8,

final %>%
  rtf_page(orientation = "landscape") %>% 
  rtf_title("表3.1  单剂量递增研究PK参数汇总(PKS)") %>% 
  rtf_colheader(
    colheader = "剂量组|统计_量|  Cmax\n (ng/mL)|tmax\n (h)   |t1/2\n (h)|    CL/F\n (L/h)|VZ/F\n (L)|AUC0-24h\n (ng/mL·h)
    |AUC0-last\n (ng/mL·h)|AUC0-\\infty\n (ng/mL·h)|Kel\n (1/h)|%AUCexp\n (%)|CLr\n (L/h)"
  ) %>%
  rtf_body() %>%
  rtf_encode() %>% 
  write_rtf("Outputs/TFLs/T_3_1-2.rtf") 

Chinese characters cann‘’t be compiled correctly,like this: wrong

LaTex Code

another problem is the LaTeX code. When the subline _would be replaced with \sub, the following situation occured.

统计_量 was showed as 统计.

My solution

Finally, I found the problem may be rtf code \fcharsetN. When I replaced \fcharset161 with \fcharset134, everything looked normal. right

Advise

  1. Whether there is a need for improvement about function as_rtf_font

    as_rtf_font <- function() {
    font_type <- font_type()
    font_rtf <- factor(c(1:10), levels = font_type$type, labels = font_type$rtf_code)
    font_style <- factor(c(1:10), levels = font_type$type, labels = font_type$style)
    font_name <- factor(c(1:10), levels = font_type$type, labels = font_type$name)
    
    font_table <- paste0(
    "{\\fonttbl",
    paste(paste0("{", font_rtf, font_style, "\\fcharset161\\fprq2 ", font_name, ";}\n"), collapse = ""),
    "}\n"
    )
    
    font_table
    }
  2. how function convert can solve the subscript problem?

elong0527 commented 8 months ago

Thanks for the suggestions.

  1. For Chinese characters, please refer #89 and uses r2rtf:::utf8Tortf function.
  2. You can set text_convert = FALSE in rtf_colheader and rtf_body etc to avoid replacing _ to \sub.
cglaze11 commented 8 months ago

Thanks for the answer.

The reason that I tried to replaced \fcharset161 with \fcharset134 was the function r2rtf:::utf8Tortf which didn't work. the code:

final %>%
  rtf_page(orientation = "landscape") %>% 
  rtf_title(r2rtf:::utf8Tortf("表3.1  单剂量递增研究PK参数汇总(PKS)")) %>% 
  rtf_colheader(
    colheader = "剂量组|统计_量|  Cmax\n (ng/mL)|tmax\n (h)   |t1/2\n (h)|    CL/F\n (L/h)|VZ/F\n (L)|AUC0-24h\n (ng/mL·h)
    |AUC0-last\n (ng/mL·h)|AUC0-\\infty\n (ng/mL·h)|Kel\n (1/h)|%AUCexp\n (%)|CLr\n (L/h)"
  ) %>%
  rtf_body() %>% 
  rtf_encode() %>% 
  write_rtf("Outputs/TFLs/T_3_1-2.rtf") 

and output: utf2rtf

I couldn't figure out why the result of r2rtf:::utf8Tortf was NA.

> r2rtf:::utf8Tortf("表3.1  单剂量递增研究PK参数汇总(PKS)")
[1] "NA"

Whether is the problem of function utf8ToInt?

> utf8ToInt("表")
[1] NA
> utf8ToInt("a")
[1] 97
elong0527 commented 8 months ago

Could you provide your session information using sessionInfo()?

I can not reproduce the issue from Posit Cloud https://posit.cloud

Below is the output I get.

> library(r2rtf)
> utf8ToInt("表")
[1] 34920
> utf8ToInt("a")
[1] 97

One possibility is that your computer is not using UTF-8 encoding. Please check the encoding by

> Encoding("表")
[1] "UTF-8"
cglaze11 commented 8 months ago

You are right. The locale isn't UTF-8.

> Encoding("表")
[1] "unknown"
> sessionInfo()
locale:
[1] LC_COLLATE=English_United States.936  LC_CTYPE=English_United States.936   
[3] LC_MONETARY=English_United States.936 LC_NUMERIC=C                         
[5] LC_TIME=English_United States.936  

I set locale to UTF-8.

Sys.setlocale("LC_ALL", "German.UTF-8")  

Then all chinese character displayed right. Thanks very much again!