ekstroem / dataReporter

85 stars 4 forks source link

issue with utf-8 weird chars #2

Open gbekes opened 3 years ago

gbekes commented 3 years ago

Great package

Used development version. Data has a variable with variety of cities around globe, UTF-8. (Windows, RStudio), can't share, sry.

It runs through despite errors: with weird characters, e.g. html gives you “Wiener Neustädt”

annennenne commented 3 years ago

Thanks again for submitting the issue.

I think I'll need a minimal example that produces the error in order to find a solution. I'm having trouble getting the error myself.

Does the following code produce the problem for you?

a <- data.frame(cities = c(rep("Copenhagen", 2), "Budapest", "Wiener Neustädt"),
                num = c(1, 2, 1, 3))
library(dataReporter)
makeDataReport(a, file = "deleteme.rmd", output = "html", replace = TRUE)

And if so, would you mind sharing the output of devtools::session_info()?

annennenne commented 3 years ago

For future self: We discussed the issue further via email, seems like the problem was local. I was not able to reproduce it even with the original data.

vorpalvorpal commented 2 years ago

I am getting the same issue. Your European city example gives the following error:

Data report generation is finished. Please wait while your output file is being rendered.
Error in sub(re, "", x, perl = TRUE) : input string 2 is invalid UTF-8
In addition: Warning messages:
1: In readLines(con, warn = FALSE) :
  invalid input found on input connection 'deleteme.rmd'
2: In xfun::read_utf8(input) :
  The file deleteme.rmd is not encoded in UTF-8. These lines contain invalid UTF-8 characters: 94, 102

deleteme.rmd shows Neust?dt in place of Neustädt if I open it in rstudio (defualts to opening Rmds as utf-8). If I choose to reopen it with ISO-8859-1 (system default encoding) Neustädt displays correctly. This is a stupid windows problem I've run up against before.

Looking at the docs for rmarkdown::render(), the rmarkdown document produced is always UTF-8 (there is an encoding argument, but it is actually ignored). However the default option for file() is encoding = getOption("encoding") which by default is "native.enc". Thus when the native encoding isn't UTF-8 it will save the UTF-8 document in the native encoding (eg. ISO-8859-1) leading to any fancy characters being mis-rendered. Because the output of rmarkdown::render() is always going to be UTF-8, the encoding set by file() should be set to explicitly be "UTF-8". So in makeDataReport.R calls to file should be:

fileConn <- file(file, "w", encoding = "UTF-8") #for main document
vListConn <- file(vListFileName, "w", encoding = "UTF-8")

I haven't tested that the change actually works, but I think that it should do.

output of devtools::session_info():

- Session info -----------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 4.1.1 (2021-08-10)
 os       Windows 10 x64 (build 15063)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_Australia.1252
 ctype    English_Australia.1252
 tz       Australia/Sydney
 date     2022-02-01
 rstudio  2021.09.0+351 Ghost Orchid (desktop)
 pandoc   2.14.0.3 @ C:/Users/XXX/scoop/apps/rstudio/current/bin/pandoc/ (via rmarkdown)

- Packages ---------------------------------------------------------------------------------------------------------------
 ! package         * version    date (UTC) lib source
   askpass           1.1        2019-01-13 [1] CRAN (R 4.1.1)
   assertthat        0.2.1      2019-03-21 [1] CRAN (R 4.1.1)
   backports         1.3.0      2021-10-27 [1] CRAN (R 4.1.1)
   base64enc         0.1-3      2015-07-28 [1] CRAN (R 4.1.1)
   broom           * 0.7.10     2021-10-31 [1] CRAN (R 4.1.1)
   bslib             0.3.1      2021-10-06 [1] CRAN (R 4.1.2)
   cachem            1.0.6      2021-08-19 [1] CRAN (R 4.1.1)
   callr             3.7.0      2021-04-20 [1] CRAN (R 4.1.1)
   cellranger        1.1.0      2016-07-27 [1] CRAN (R 4.1.1)
   checkmate         2.0.0      2020-02-06 [1] CRAN (R 4.1.1)
   CHNOSZ            1.4.1      2021-04-09 [1] CRAN (R 4.1.2)
   class             7.3-19     2021-05-03 [1] CRAN (R 4.1.1)
   classInt          0.4-3      2020-04-07 [1] CRAN (R 4.1.1)
   cli               3.1.0      2021-10-27 [1] CRAN (R 4.1.1)
   cluster           2.1.2      2021-04-17 [1] CRAN (R 4.1.1)
   codetools         0.2-18     2020-11-04 [1] CRAN (R 4.1.1)
   colorspace        2.0-2      2021-06-24 [1] CRAN (R 4.1.1)
   crayon          * 1.4.2      2021-10-29 [1] CRAN (R 4.1.2)
   crosstalk         1.1.1      2021-01-12 [1] CRAN (R 4.1.1)
   curl              4.3.2      2021-06-23 [1] CRAN (R 4.1.1)
   cusumcharter      0.1.0      2021-11-15 [1] CRAN (R 4.1.2)
   data.table        1.14.2     2021-09-27 [1] CRAN (R 4.1.1)
   data.tree         1.0.0      2020-08-03 [1] CRAN (R 4.1.1)
   dataReporter    * 1.0.2      2021-11-11 [1] CRAN (R 4.1.2)
   DBI               1.1.1      2021-01-15 [1] CRAN (R 4.1.1)
   dbplyr            2.1.1      2021-04-06 [1] CRAN (R 4.1.1)
   DEoptimR          1.0-10     2022-01-03 [1] CRAN (R 4.1.2)
   desc              1.4.0      2021-09-28 [1] CRAN (R 4.1.2)
   devtools          2.4.3      2021-11-30 [1] CRAN (R 4.1.2)
   diffdf          * 1.0.4      2020-03-17 [1] CRAN (R 4.1.1)
   digest            0.6.28     2021-09-23 [1] CRAN (R 4.1.1)
   dplyover        * 0.0.8.9002 2021-11-01 [1] Github (TimTeaFan/dplyover@f0cd984)
   dplyr           * 1.0.7      2021-06-18 [1] CRAN (R 4.1.1)
   DT                0.19       2021-09-02 [1] CRAN (R 4.1.1)
   e1071             1.7-9      2021-09-16 [1] CRAN (R 4.1.1)
   editData        * 0.1.8      2021-04-02 [1] CRAN (R 4.1.1)
   ellipsis          0.3.2      2021-04-29 [1] CRAN (R 4.1.1)
   evaluate          0.14       2019-05-28 [1] CRAN (R 4.1.1)
   exifr           * 0.3.2      2021-03-20 [1] CRAN (R 4.1.1)
   fansi             0.5.0      2021-05-25 [1] CRAN (R 4.1.1)
   farver            2.1.0      2021-02-28 [1] CRAN (R 4.1.1)
   fastmap           1.1.0      2021-01-25 [1] CRAN (R 4.1.1)
   flextable       * 0.6.9      2021-10-07 [1] CRAN (R 4.1.1)
   forcats         * 0.5.1      2021-01-27 [1] CRAN (R 4.1.1)
   foreign           0.8-81     2020-12-22 [1] CRAN (R 4.1.1)
   Formula           1.2-4      2020-10-16 [1] CRAN (R 4.1.1)
   fs                1.5.0      2020-07-31 [1] CRAN (R 4.1.1)
   gdtools           0.2.3      2021-01-06 [1] CRAN (R 4.1.1)
   generics          0.1.1      2021-10-25 [1] CRAN (R 4.1.1)
   ggforce         * 0.3.3      2021-03-05 [1] CRAN (R 4.1.1)
   ggh4x           * 0.2.0      2021-08-21 [1] CRAN (R 4.1.1)
   ggiraph         * 0.7.10     2021-05-19 [1] CRAN (R 4.1.2)
   ggplot2         * 3.3.5      2021-06-25 [1] CRAN (R 4.1.1)
   ggrepel           0.9.1      2021-01-15 [1] CRAN (R 4.1.1)
   glue            * 1.4.2      2020-08-27 [1] CRAN (R 4.1.1)
   GQAnalyzer      * 0.1.0      2021-11-01 [1] Github (khaors/GQAnalyzer@d51540c)
   gridExtra         2.3        2017-09-09 [1] CRAN (R 4.1.1)
   gtable            0.3.0      2019-03-25 [1] CRAN (R 4.1.1)
   hablar          * 0.3.0      2020-03-19 [1] CRAN (R 4.1.1)
   haven             2.4.3      2021-08-04 [1] CRAN (R 4.1.1)
   here            * 1.0.1      2020-12-13 [1] CRAN (R 4.1.1)
   highr             0.9        2021-04-16 [1] CRAN (R 4.1.1)
   Hmisc             4.6-0      2021-10-07 [1] CRAN (R 4.1.1)
   hms               1.1.1      2021-09-26 [1] CRAN (R 4.1.1)
   htmlTable         2.3.0      2021-10-12 [1] CRAN (R 4.1.1)
   htmltools         0.5.2      2021-08-25 [1] CRAN (R 4.1.1)
   htmlwidgets       1.5.4      2021-09-08 [1] CRAN (R 4.1.1)
   httpuv            1.6.3      2021-09-09 [1] CRAN (R 4.1.1)
   httr              1.4.2      2020-07-20 [1] CRAN (R 4.1.1)
   janitor         * 2.1.0      2021-01-05 [1] CRAN (R 4.1.1)
   jpeg              0.1-9      2021-07-24 [1] CRAN (R 4.1.1)
   jquerylib         0.1.4      2021-04-26 [1] CRAN (R 4.1.1)
   jsonlite          1.7.2      2020-12-09 [1] CRAN (R 4.1.1)
   KernSmooth        2.23-20    2021-05-03 [1] CRAN (R 4.1.1)
   knitr             1.36       2021-09-29 [1] CRAN (R 4.1.1)
   labeling          0.4.2      2020-10-20 [1] CRAN (R 4.1.1)
   labelled        * 2.9.0      2021-10-29 [1] CRAN (R 4.1.2)
   later             1.3.0      2021-08-18 [1] CRAN (R 4.1.1)
   lattice           0.20-44    2021-05-02 [1] CRAN (R 4.1.1)
   latticeExtra      0.6-29     2019-12-19 [1] CRAN (R 4.1.1)
   leafem            0.1.6      2021-05-24 [1] CRAN (R 4.1.1)
   leaflet         * 2.0.4.1    2021-01-07 [1] CRAN (R 4.1.1)
   leafpm          * 0.1.0      2019-03-13 [1] CRAN (R 4.1.1)
   librarian         1.8.1      2021-07-12 [1] CRAN (R 4.1.1)
   lifecycle         1.0.1      2021-09-24 [1] CRAN (R 4.1.1)
   lubridate       * 1.8.0      2021-10-07 [1] CRAN (R 4.1.1)
   magrittr        * 2.0.1      2020-11-17 [1] CRAN (R 4.1.1)
   mapedit         * 0.6.0      2020-02-02 [1] CRAN (R 4.1.1)
   mapview         * 2.10.0     2021-06-05 [1] CRAN (R 4.1.1)
   MASS              7.3-54     2021-05-03 [1] CRAN (R 4.1.1)
   Matrix            1.3-4      2021-06-01 [1] CRAN (R 4.1.1)
   memoise           2.0.1      2021-11-26 [1] CRAN (R 4.1.2)
   mgcv              1.8-36     2021-06-01 [1] CRAN (R 4.1.1)
   mime              0.12       2021-09-28 [1] CRAN (R 4.1.1)
   miniUI            0.1.1.1    2018-05-18 [1] CRAN (R 4.1.1)
   modelr            0.1.8      2020-05-19 [1] CRAN (R 4.1.1)
   munsell           0.5.0      2018-06-12 [1] CRAN (R 4.1.1)
   nlme              3.1-152    2021-02-04 [1] CRAN (R 4.1.1)
   nnet              7.3-16     2021-05-03 [1] CRAN (R 4.1.1)
   officer         * 0.4.1      2021-11-14 [1] CRAN (R 4.1.2)
   openxlsx        * 4.2.4      2021-06-16 [1] CRAN (R 4.1.1)
   pander          * 0.6.4      2021-06-13 [1] CRAN (R 4.1.2)
   pdftools          3.0.1      2021-05-06 [1] CRAN (R 4.1.1)
   pillar            1.6.4      2021-10-18 [1] CRAN (R 4.1.1)
   pkgbuild          1.3.1      2021-12-20 [1] CRAN (R 4.1.2)
   pkgcond         * 0.1.1      2021-04-28 [1] CRAN (R 4.1.1)
   pkgconfig         2.0.3      2019-09-22 [1] CRAN (R 4.1.1)
   pkgload           1.2.4      2021-11-30 [1] CRAN (R 4.1.2)
   plyr              1.8.6      2020-03-03 [1] CRAN (R 4.1.1)
   png               0.1-7      2013-12-03 [1] CRAN (R 4.1.1)
   polyclip          1.10-0     2019-03-14 [1] CRAN (R 4.1.1)
   pracma            2.3.3      2021-01-23 [1] CRAN (R 4.1.1)
   prettyunits       1.1.1      2020-01-24 [1] CRAN (R 4.1.1)
   processx          3.5.2      2021-04-30 [1] CRAN (R 4.1.1)
   promises          1.2.0.1    2021-02-11 [1] CRAN (R 4.1.1)
   proxy             0.4-26     2021-06-07 [1] CRAN (R 4.1.1)
   ps                1.6.0      2021-02-28 [1] CRAN (R 4.1.1)
   purrr           * 0.3.4      2020-04-17 [1] CRAN (R 4.1.1)
   qpdf            * 1.1        2019-03-07 [1] CRAN (R 4.1.1)
   R6                2.5.1      2021-08-19 [1] CRAN (R 4.1.1)
   rappdirs          0.3.3      2021-01-31 [1] CRAN (R 4.1.1)
   raster            3.5-11     2021-12-23 [1] CRAN (R 4.1.2)
   RColorBrewer    * 1.1-2      2014-12-07 [1] CRAN (R 4.1.1)
   Rcpp              1.0.7      2021-07-07 [1] CRAN (R 4.1.1)
   readr           * 2.0.2      2021-09-27 [1] CRAN (R 4.1.1)
   readxl          * 1.3.1      2019-03-13 [1] CRAN (R 4.1.1)
   remotes           2.4.1      2021-09-29 [1] CRAN (R 4.1.1)
   repr              1.1.3      2021-01-21 [1] CRAN (R 4.1.1)
   reprex            2.0.1      2021-08-05 [1] CRAN (R 4.1.1)
   rhandsontable   * 0.3.8      2021-05-27 [1] CRAN (R 4.1.1)
   rio               0.5.27     2021-06-21 [1] CRAN (R 4.1.1)
 D rJava             1.0-5      2021-09-24 [1] CRAN (R 4.1.1)
   rlang             0.4.12     2021-10-18 [1] CRAN (R 4.1.1)
   rlist           * 0.4.6.2    2021-09-03 [1] CRAN (R 4.1.1)
   rmarkdown         2.11       2021-09-14 [1] CRAN (R 4.1.1)
   robustbase        0.93-9     2021-09-27 [1] CRAN (R 4.1.2)
   rpart             4.1-15     2019-04-12 [1] CRAN (R 4.1.1)
   rprojroot         2.0.2      2020-11-15 [1] CRAN (R 4.1.1)
   rstudioapi        0.13       2020-11-12 [1] CRAN (R 4.1.1)
   rvest             1.0.2      2021-10-16 [1] CRAN (R 4.1.1)
   sass              0.4.0      2021-05-12 [1] CRAN (R 4.1.1)
   satellite         1.0.4      2021-10-12 [1] CRAN (R 4.1.1)
   scales            1.1.1      2020-05-11 [1] CRAN (R 4.1.1)
   sessioninfo       1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
   sf              * 1.0-3      2021-10-07 [1] CRAN (R 4.1.1)
   shiny           * 1.7.1      2021-10-02 [1] CRAN (R 4.1.1)
   shinyjs         * 2.0.0      2020-09-09 [1] CRAN (R 4.1.1)
   shinyWidgets    * 0.6.2      2021-09-17 [1] CRAN (R 4.1.1)
   skimr           * 2.1.3      2021-03-07 [1] CRAN (R 4.1.1)
   snakecase         0.11.0     2019-05-25 [1] CRAN (R 4.1.1)
   SOfun           * 1.76       2021-11-01 [1] Github (mrdwab/SOfun@e41fa62)
   sp                1.4-6      2021-11-14 [1] CRAN (R 4.1.2)
   splitstackshape * 1.4.8      2019-04-21 [1] CRAN (R 4.1.1)
   staplr          * 3.1.1      2021-01-11 [1] CRAN (R 4.1.1)
   stringdist      * 0.9.8      2021-09-09 [1] CRAN (R 4.1.1)
   stringi         * 1.7.5      2021-10-04 [1] CRAN (R 4.1.1)
   stringr         * 1.4.0      2019-02-10 [1] CRAN (R 4.1.1)
   survival          3.2-11     2021-04-26 [1] CRAN (R 4.1.1)
   systemfonts       1.0.3      2021-10-13 [1] CRAN (R 4.1.1)
   terra             1.5-12     2022-01-13 [1] CRAN (R 4.1.1)
   tesseract       * 4.1.2      2021-09-18 [1] CRAN (R 4.1.1)
   testthat          3.1.1      2021-12-03 [1] CRAN (R 4.1.2)
   tibble          * 3.1.5      2021-09-30 [1] CRAN (R 4.1.1)
   tidyr           * 1.1.4      2021-09-27 [1] CRAN (R 4.1.1)
   tidyselect      * 1.1.1      2021-04-30 [1] CRAN (R 4.1.1)
   tidyverse       * 1.3.1      2021-04-15 [1] CRAN (R 4.1.1)
   tweenr            1.0.2      2021-03-23 [1] CRAN (R 4.1.1)
   tzdb              0.1.2      2021-07-20 [1] CRAN (R 4.1.1)
   units           * 0.7-2      2021-06-08 [1] CRAN (R 4.1.1)
   usethis           2.1.5      2021-12-09 [1] CRAN (R 4.1.2)
   utf8              1.2.2      2021-07-24 [1] CRAN (R 4.1.1)
   uuid            * 0.1-4      2020-02-26 [1] CRAN (R 4.1.1)
   vctrs             0.3.8      2021-04-29 [1] CRAN (R 4.1.1)
   webchem         * 1.1.1      2021-02-07 [1] CRAN (R 4.1.1)
   webshot           0.5.2      2019-11-22 [1] CRAN (R 4.1.1)
   whoami            1.3.0      2019-03-19 [1] CRAN (R 4.1.2)
   withr             2.4.2      2021-04-18 [1] CRAN (R 4.1.1)
   xfun              0.27       2021-10-18 [1] CRAN (R 4.1.1)
   xml2              1.3.2      2020-04-23 [1] CRAN (R 4.1.1)
   xtable            1.8-4      2019-04-21 [1] CRAN (R 4.1.1)
   yaml              2.2.1      2020-02-01 [1] CRAN (R 4.1.1)
   zeallot         * 0.1.0      2018-01-28 [1] CRAN (R 4.1.1)
   zip               2.2.0      2021-05-31 [1] CRAN (R 4.1.1)
   zoo             * 1.8-9      2021-03-09 [1] CRAN (R 4.1.1)

 [1] C:/Users/XXX/scoop/apps/r/4.1.1/library

 D -- DLL MD5 mismatch, broken installation.
vorpalvorpal commented 2 years ago

In the meantime for anyone else encountering this issue until the fix is published, it can be resolved locally by setting:

options(encoding = "UTF-8")
annennenne commented 2 years ago

Thank you for this thorough and excellent suggestion. We will definitely look into making these changes the next time we work on updates!