davidgohel / flextable

table farming
https://ardata-fr.github.io/flextable-book/
554 stars 79 forks source link

Use NFD Unicode normalization form to fix accented characters in Word #586

Closed uhkeller closed 8 months ago

uhkeller commented 10 months ago

When writing accented (combined) characters to a table with body_add_flextable(), combined characters using NFC Unicode normalization are displayed by Word in Arial/Helvetica instead of the font specified in set_flextable_defaults(). For example:

library(flextable)
library(officer)
set_flextable_defaults(font.family = "Cambria")
read_docx() |>
  body_add_flextable(qflextable(data.frame(a = "eé"))) |>
  print(target = "accent_test_bad.docx")

When opening the generated file in MS Word for macOS, the accented "é" is displayed using Helvetica instead of Cambria. On Windows, it's Arial.

image

In LibreOffice it is displayed correctly.

image

When typed into RStudio, the "é" is encoded using NFC. However, if it's converted to NFD using stringi::stri_trans_nfd(), the problem disappears and the table looks as intended in both Word and LibreOffice:

library(flextable)
library(officer)
set_flextable_defaults(font.family = "Cambria")
read_docx() |>
  body_add_flextable(qflextable(data.frame(a = stringr::stri_trans_nfd("eé")))) |>
  print(target = "accent_test_good.docx")

image

It took us several hours to figure out what's going on there. To spare others the trouble, it would be very helpful if the documentation on body_add_flextable() could mention this. Or maybe non-NFD Unicode strings could be converted automatically?

Edit: flextable 0.9.4, Word for Mac 16.78.3, Word for Windows version 2302

davidgohel commented 9 months ago

Hello,

See this thread #383

This can be easily solved by specifying the same font for 'hansi.family' and without using 'stringi' by using set_flextable_defaults(font.family = "Cambria", hansi.family = "Cambria"):

library(flextable)
library(officer)
set_flextable_defaults(font.family = "Cambria", hansi.family = "Cambria")
read_docx() |>
  body_add_flextable(qflextable(data.frame(a = "eé"))) |> 
  print(target = "accent_test_bad.docx") 
Capture d’écran 2023-12-10 à 21 28 31

I understand it's not easy to guess that hansi.family exists but I prefer to follow advice given in the manual of stringi::stri_trans_nfd()

[...] you will rather not use these functions in typical string processing activities [...]

uhkeller commented 9 months ago

Thanks a lot for pointing this out, and sorry I missed it. Maybe an additional sentence in the help for body_help_flextable() and/or set_flextable_default() would be helpful? Then people wouldn't have to guess.

davidgohel commented 9 months ago

@uhkeller sure, you're right

davidgohel commented 9 months ago
github-actions[bot] commented 2 months ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.