gbekes commented 3 years ago

Great package

Used development version. Data has a variable with variety of cities around globe, UTF-8. (Windows, RStudio), can't share, sry.

It runs through despite errors: with weird characters, e.g. html gives you “Wiener Neustädt”

annennenne commented 3 years ago

Thanks again for submitting the issue.

I think I'll need a minimal example that produces the error in order to find a solution. I'm having trouble getting the error myself.

Does the following code produce the problem for you?

a <- data.frame(cities = c(rep("Copenhagen", 2), "Budapest", "Wiener Neustädt"),
                num = c(1, 2, 1, 3))
makeDataReport(a, file = "deleteme.rmd", output = "html", replace = TRUE)

And if so, would you mind sharing the output of devtools::session_info()?

annennenne commented 3 years ago

For future self: We discussed the issue further via email, seems like the problem was local. I was not able to reproduce it even with the original data.

vorpalvorpal commented 2 years ago

I am getting the same issue. Your European city example gives the following error:

Data report generation is finished. Please wait while your output file is being rendered.
Error in sub(re, "", x, perl = TRUE) : input string 2 is invalid UTF-8
In addition: Warning messages:
1: In readLines(con, warn = FALSE) :
  invalid input found on input connection 'deleteme.rmd'
2: In xfun::read_utf8(input) :
  The file deleteme.rmd is not encoded in UTF-8. These lines contain invalid UTF-8 characters: 94, 102

deleteme.rmd shows Neust?dt in place of Neustädt if I open it in rstudio (defualts to opening Rmds as utf-8). If I choose to reopen it with ISO-8859-1 (system default encoding) Neustädt displays correctly. This is a stupid windows problem I've run up against before.

Looking at the docs for rmarkdown::render(), the rmarkdown document produced is always UTF-8 (there is an encoding argument, but it is actually ignored). However the default option for file() is encoding = getOption("encoding") which by default is "native.enc". Thus when the native encoding isn't UTF-8 it will save the UTF-8 document in the native encoding (eg. ISO-8859-1) leading to any fancy characters being mis-rendered. Because the output of rmarkdown::render() is always going to be UTF-8, the encoding set by file() should be set to explicitly be "UTF-8". So in makeDataReport.R calls to file should be:

fileConn <- file(file, "w", encoding = "UTF-8") #for main document
vListConn <- file(vListFileName, "w", encoding = "UTF-8")

I haven't tested that the change actually works, but I think that it should do.

output of devtools::session_info():

- Session info -----------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 4.1.1 (2021-08-10)
 os       Windows 10 x64 (build 15063)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_Australia.1252
 ctype    English_Australia.1252
 tz       Australia/Sydney
 date     2022-02-01
 rstudio  2021.09.0+351 Ghost Orchid (desktop)
 pandoc @ C:/Users/XXX/scoop/apps/rstudio/current/bin/pandoc/ (via rmarkdown)

vorpalvorpal commented 2 years ago

In the meantime for anyone else encountering this issue until the fix is published, it can be resolved locally by setting:

options(encoding = "UTF-8")
annennenne commented 2 years ago

Thank you for this thorough and excellent suggestion. We will definitely look into making these changes the next time we work on updates!