Rapporter / pander

An R Pandoc Writer: Convert arbitrary R objects into markdown
http://rapporter.github.io/pander/
Open Software License 3.0
294 stars 66 forks source link

pander + R 3.4.0 → either a failure or an encoding issue #296

Closed GegznaV closed 5 years ago

GegznaV commented 7 years ago

While using pander and R version 3.4.0 I faced either an error or an encoding issue:

  1. This code fails, while international strings are used:
    Sys.setlocale(locale = "Lithuanian")
    df <- iris[1:2,]
    rownames(df) <- c("Pagal „a“ formulę", "Pagal „b“ formulę")
    pander::pander(df)

Error message:

Error in table.expand(x, t.width, justify, sep.col) : basic_string::_S_create
  1. These lines are decoded incorrectly:
    Sys.setlocale(locale = "Lithuanian")
    df <- iris[1:2, 4:5]
    rownames(df) <- c("ą ž", "š ė")
    pander::pander(df)

Result:

---------------------------------
 &nbsp;    Petal.Width   Species 
--------- ------------- ---------
**ą ž**      0.2       setosa  

**Å Ä—**      0.2       setosa  
---------------------------------

The problem disappears when I switch back to R 3.3.3.
Is there a way to overcome this bug without switching back to previous versions of R?

 devtools::session_info()
Session info -------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.0 (2017-04-21)
 system   x86_64, mingw32             
 ui       RStudio (1.0.143)           
 language (EN)                        
 collate  Lithuanian_Lithuania.1257   
 tz       Europe/Helsinki             
 date     2017-05-01                  

Packages -----------------------------------------------------------------------------------------------
 package  * version date       source        
 devtools   1.12.0  2016-12-05 CRAN (R 3.4.0)
 digest     0.6.12  2017-01-27 CRAN (R 3.4.0)
 memoise    1.1.0   2017-04-21 CRAN (R 3.4.0)
 pander   * 0.6.0   2015-11-23 CRAN (R 3.4.0)
 Rcpp       0.12.10 2017-03-19 CRAN (R 3.4.0)
 withr      1.0.2   2016-06-20 CRAN (R 3.4.0)
daroczig commented 7 years ago

@RomanTsegelskyi any ideas regarding the Rcpp error message when using R 3.4.0?

lselzer commented 7 years ago

I can confirm this bug in Spanish locale, but I don't get an error, it's just wrongly encoded. The bug disappers in R 3.3.3

philsf commented 7 years ago

For me it is the oposite: colnames and rownames work fine, but the data is incorrectly encoded in output.

This only happens in Windows, and it appears to happen both in R 3.4.0 and 3.4.1. It also does not happen if I switch back to 3.3.3.

> df <- cbind("á" = "á", "é" = "é", "ç" = "ç")
> rownames(df) <- "ã"
> pander::pander(df)

-------------------
&nbsp;   á   é   ç 
------- --- --- ---
 **ã**  á  é  ç 
-------------------

I have access to a linux box, and it does not happen in linux (which uses UTF-8).

I tried setting Encoding() and it still comes out wrong (albeit differently).

> Encoding(df)
[1] "latin1" "latin1" "latin1"
> Encoding(df) <- "UTF-8"
> Encoding(df)
[1] "UTF-8" "UTF-8" "UTF-8"
> pander::pander(df)

----------------------
&nbsp;   á    é    ç  
------- ---- ---- ----
 **ã**  <e1> <e9> <e7>
----------------------

Trying enc2native() makes no effect (work around issue #280 ).

Below the session info.

> devtools::session_info()
Session info -----------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.0 (2017-04-21)
 system   x86_64, mingw32             
 ui       RStudio (1.0.143)           
 language (EN)                        
 collate  Portuguese_Brazil.1252      
 tz       America/Sao_Paulo           
 date     2017-07-03                  

Packages ---------------------------------------------------------------------------------
 package    * version date       source        
 backports    1.1.0   2017-05-22 CRAN (R 3.4.0)
 base       * 3.4.0   2017-04-21 local         
 compiler     3.4.0   2017-04-21 local         
 datasets   * 3.4.0   2017-04-21 local         
 devtools     1.13.2  2017-06-02 CRAN (R 3.4.1)
 digest       0.6.12  2017-01-27 CRAN (R 3.4.0)
 evaluate     0.10.1  2017-06-24 CRAN (R 3.4.0)
 graphics   * 3.4.0   2017-04-21 local         
 grDevices  * 3.4.0   2017-04-21 local         
 htmltools    0.3.6   2017-04-28 CRAN (R 3.4.0)
 knitr        1.16    2017-05-18 CRAN (R 3.4.0)
 magrittr     1.5     2014-11-22 CRAN (R 3.4.0)
 memoise      1.1.0   2017-04-21 CRAN (R 3.4.1)
 methods    * 3.4.0   2017-04-21 local         
 pander       0.6.0   2015-11-23 CRAN (R 3.4.0)
 Rcpp         0.12.11 2017-05-22 CRAN (R 3.4.0)
 rmarkdown    1.6     2017-06-15 CRAN (R 3.4.0)
 rprojroot    1.2     2017-01-16 CRAN (R 3.4.0)
 rstudioapi   0.6     2016-06-27 CRAN (R 3.4.1)
 stats      * 3.4.0   2017-04-21 local         
 stringi      1.1.5   2017-04-07 CRAN (R 3.4.0)
 stringr      1.2.0   2017-02-18 CRAN (R 3.4.0)
 tools        3.4.0   2017-04-21 local         
 utils      * 3.4.0   2017-04-21 local         
 withr        1.0.2   2016-06-20 CRAN (R 3.4.1)
 yaml         2.1.14  2016-11-12 CRAN (R 3.4.0)
GegznaV commented 6 years ago

I created a data.frame df in the Lithuanian locale. The object df:

df
#                   vidurkis PI_apatine_riba PI_virsutine_riba  n
# Pagal „z“ formulę     54.9            52.4              57.3 24
# Pagal „t“ formulę     54.9            52.3              57.5 24

And run pander(df):

library(pander)
debugonce(pandoc.table.return)

Sys.setlocale(locale = "Lithuanian")
df <- readRDS("df.Rds")
pander(df)

Before code breaking in lines 582-583, object t was created. I saved that object as "t.Rds".

# lines 582-583 in `pandoc.table.return` where the error occurs:
res <- paste0(res, paste(apply(t, 1, function(x) paste0(table.expand(x, 
       t.width, justify, sep.col), sep.row)), collapse = "\n"))

Other code needed to run these lines:

# Function, defined inside `pander::pandoc.table.return`
table.expand <- function(cells, cols.width, justify, sep.cols) {
    .Call("pander_tableExpand_cpp", PACKAGE = "pander", 
          cells, cols.width, justify, sep.cols, style)
}

# Parameters before calling `table.expand`
t.width <- c(23, 10, 17, 19, 4)
justify <- c("centre", "centre", "centre", "centre", "centre")
sep.col <- c("",  " ", "" )
style   <- "multiline"

After leaving the debugging mode, I loaded the first row of "t.Rds" and created the analogous line as a character vector t0.

t <- readRDS("t.Rds")[1, ]
the_names <- names(t)

# The contents of `t0` are same contents as in `t`
t0 <- c("**Pagal „z“ formulę**", "54.9", "52.4", "57.3", "24")
names(t0) <- the_names

print(t0)
# t.rownames                vidurkis  PI_apatine_riba   PI_virsutine_riba  n 
# "**Pagal „z“ formulę**"   "54.9"    "52.4"             "57.3"            "24" 

print(t)
# t.rownames                vidurkis  PI_apatine_riba   PI_virsutine_riba  n 
# "**Pagal „z“ formulę**"   "54.9"    "52.4"            "57.3"             "24" 

sapply(t0, Encoding)
# t.rownames     vidurkis   PI_apatine_riba  PI_virsutine_riba  n 
# "unknown"      "unknown"  "unknown"        "unknown"          "unknown" 

sapply(t, Encoding)
# t.rownames      vidurkis   PI_apatine_riba  PI_virsutine_riba  n 
# "UTF-8"         "unknown"  "unknown"        "unknown"         "unknown" 

table.expand(t0, t.width, justify, sep.col)
# [1] " **Pagal „z“ formulę**     54.9          52.4               57.3          24 "

table.expand(t,  t.width, justify, sep.col)
## Error in table.expand(t, t.width, justify, sep.col) : 
##     basic_string::_S_create

table.expand wokrs fine with t0 and breaks with t. The only difference between these two objects is the encoding of the element t.rownames. Therefore it seems that "UTF-8" causes the error.

Any ideas why does the encoding change and cause this problem and how it's possible to fix it?

p.s. This comment could be the "more information" for #280

Objects t and df.zip

daroczig commented 6 years ago

Thanks a lot for the detailed info, really helpful!

@RomanTsegelskyi, would you have a chance to look into this?

RomanTsegelskyi commented 6 years ago

Sorry I missed all the notifications before, I will try to look into this

GegznaV commented 6 years ago

Is there any news on this issue?

lselzer commented 6 years ago

I've been exploring this issue and have found that the culprit is tableExpand_cpp

daroczig commented 6 years ago

Thanks, @lselzer! @RomanTsegelskyi, any chance you might be able to look into this?

lselzer commented 6 years ago

using enc2native inside table.expand fixes these issue, though I don't know how robust is this solution. I only know so little about character encoding and I don't know how this will work with other languages like chinese.

I can make a PR if you are willing to accept it.

GegznaV commented 6 years ago

@lselzer , on your computer, does it solve the original issue of this thread #296? I installed pander from your repository, but when I use the Lithuanian locale and the provided example, there is no effect on my PC (code results in the same error).

lselzer commented 6 years ago

Yes, it solves the issue. I tried your code, tried to reproduce your error but I couldn't

devtools::session_info()
Session info ------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.4.3 (2017-11-30)
 system   x86_64, mingw32             
 ui       RStudio (1.1.383)           
 language (EN)                        
 collate  Lithuanian_Lithuania.1257   
 tz       America/Buenos_Aires        
 date     2018-03-12                  

Packages ----------------------------------------------------------------------------------------------------
 package   * version    date       source                              
 base      * 3.4.3      2017-11-30 local                               
 compiler    3.4.3      2017-11-30 local                               
 datasets  * 3.4.3      2017-11-30 local                               
 devtools    1.13.4     2017-11-09 CRAN (R 3.4.2)                      
 digest      0.6.15     2018-02-12 Github (eddelbuettel/digest@d9f40a9)
 graphics  * 3.4.3      2017-11-30 local                               
 grDevices * 3.4.3      2017-11-30 local                               
 memoise     1.1.0      2017-04-21 CRAN (R 3.4.0)                      
 methods   * 3.4.3      2017-11-30 local                               
 pander      0.6.1      2018-02-14 local                               
 Rcpp        0.12.15.1  2018-02-14 Github (RcppCore/Rcpp@15b3a87)      
 stats     * 3.4.3      2017-11-30 local                               
 tools       3.4.3      2017-11-30 local                               
 utils     * 3.4.3      2017-11-30 local                               
 withr       2.1.1.9000 2017-12-22 Github (jimhester/withr@df18523)    
 yaml        2.1.14     2016-11-12 CRAN (R 3.4.0)           
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=Lithuanian_Lithuania.1257  LC_CTYPE=Lithuanian_Lithuania.1257   
[3] LC_MONETARY=Lithuanian_Lithuania.1257 LC_NUMERIC=C                         
[5] LC_TIME=Lithuanian_Lithuania.1257    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.4.3   tools_3.4.3      withr_2.1.1.9000 rstudioapi_0.7   yaml_2.1.14      memoise_1.1.0   
 [7] Rcpp_0.12.15.1   pander_0.6.1     digest_0.6.15    devtools_1.13.4 
hr70 commented 6 years ago

Similar problems on a German Locale since switching from R3.3 to R3.4 (Windows). I just tried with R3.5, but that didn’t change anything. Seems as if things get encoded wrongly in the rownames if German characters (e.g. “Ä”) are present there, in the colnames if present there, and interestingly only in the rownames if present in rownames and colnames. I downloaded https://github.com/lselzer/pander/archive/06c2f6579740564063af7081373113daa62b1023.zip and tried to install it, but unfortunately couldn’t get it to work, so don’t know if this would change things. It would be great if a solution to this problem could be found. Here’s an example:

library(pander) x <- data.frame(hö = c("ä", "o", "ü")) row.names(x) <- c("A", "Ä", "C") x hö A ä Ä o C ü pander(x)


  hö


A ä

Ä o

C ü

awfrankwils commented 6 years ago

Hi there,

I am also experiencing encoding issues on Windows with R 3.5.1 and Pander 0.6.2.

I have been trying to insert unicode for no-break spaces to indent factor levels in my tables. Here are three examples of what I am trying to do. Example 1 uses a normal space that is ignored by Pander(); examples 2 and 3 use the unicode "\u00A0" which appears as  instead of a space.

#using a space (ignored by pander)
example<-rbind("Meals in a Typical Day", " 1", " 2", " 3", " 4 or more")
example<-cbind(example, counts=c("","5","10","25","20"))
example

example1 pander(example) panderexample1

#using unicode for no-break space 
example2<-rbind("Meals in a Typical Day", "\u00A01", "\u00A02", "\u00A03", "\u00A04 or more")
example2<-cbind(example2, counts=c("","5","10","25","20"))
example2

example2 pander(example2) panderexample2

#using unicode for no-break space 
example3<-rbind("Meals in a Typical Day", "\u00A0\u00A01", "\u00A0\u00A02", "\u00A0\u00A03", "\u00A0\u00A04 or more")
example3<-cbind(example3, counts=c("","5","10","25","20"))
example3

example3 pander(example3) panderexample3

sessionInfo()

R version 3.5.1 (2018-07-02) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] grid stats graphics grDevices utils datasets methods base

other attached packages: [1] bindrcpp_0.2.2 lubridate_1.7.4 forcats_0.3.0 stringr_1.3.1 dplyr_0.7.6 purrr_0.2.5
[7] readr_1.1.1 tidyr_0.8.1 tibble_1.4.2 ggplot2_3.0.0 tidyverse_1.2.1 VIM_4.7.0
[13] data.table_1.11.4 colorspace_1.3-2 pander_0.6.2 xtable_1.8-2 knitr_1.20 descr_1.1.4

loaded via a namespace (and not attached): [1] Rcpp_0.12.18 xml2_1.2.0 bindr_0.1.1 magrittr_1.5 MASS_7.3-50 hms_0.4.2
[7] rvest_0.3.2 tidyselect_0.2.4 lattice_0.20-35 R6_2.2.2 rlang_0.2.1 broom_0.5.0
[13] laeken_0.4.6 rio_0.5.10 e1071_1.6-8 withr_2.1.2 modelr_0.1.2 class_7.3-14
[19] lmtest_0.9-36 assertthat_0.2.0 abind_1.4-5 digest_0.6.15 curl_3.2 haven_1.1.2
[25] sp_1.3-1 compiler_3.5.1 DEoptimR_1.0-8 cellranger_1.1.0 pillar_1.3.0 scales_0.5.0
[31] backports_1.1.2 boot_1.3-20 jsonlite_1.5 pkgconfig_2.0.1 rstudioapi_0.7 munsell_0.5.0
[37] carData_3.0-1 httr_1.3.1 plyr_1.8.4 car_3.0-0 tools_3.5.1 nnet_7.3-12
[43] vcd_1.4-4 nlme_3.1-137 gtable_0.2.0 cli_1.0.0 readxl_1.1.0 yaml_2.2.0
[49] lazyeval_0.2.1 crayon_1.3.4 zip_1.0.0 glue_1.3.0 robustbase_0.93-1.1 openxlsx_4.1.0
[55] stringi_1.1.7 foreign_0.8-70 zoo_1.8-3

daroczig commented 6 years ago

I tested #326 in a Windows VM started and seems to do the trick, but please confirm.

hr70 commented 5 years ago

Thanks; I downloaded and installed "pander-table-expand-fallback.zip" today. Unfortunately, for me the result is the same as before.

sessionInfo() R version 3.5.1 (2018-07-02) Platform: i386-w64-mingw32/i386 (32-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 Matrix products: default locale: [1] LC_COLLATE=German_Austria.1252 LC_CTYPE=German_Austria.1252 [3] LC_MONETARY=German_Austria.1252 LC_NUMERIC=C [5] LC_TIME=German_Austria.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] pander_0.6.2

loaded via a namespace (and not attached): [1] compiler_3.5.1 tools_3.5.1 Rcpp_0.12.17 digest_0.6.15

liegepr commented 5 years ago

Hello, I have been successful with the first commit intended to solve this issue: install_github("Rapporter/pander@06c2f6579740564063af7081373113daa62b1023") but not with the latest one: install_github("Rapporter/pander@66492997bbdc4f9766d7c4573e676fbdb9bd7def")

R version 3.5.1 (2018-07-02) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17134)

locale: [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252 LC_MONETARY=French_France.1252 LC_NUMERIC=C
[5] LC_TIME=French_France.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

hr70 commented 5 years ago

I tried on my home computer, and can also confirm success with this: install_github("Rapporter/pander@06c2f65") (not able to test it in the office, as I am not allowed to install packages from github there)

dcomtois commented 5 years ago

I had an issue trying to print a data frame with cyrillic column names:

Error in table.expand(t.colnames, t.width, justify, sep.col) : 
  basic_string::_S_create

Installing the patch mentionned in the above comment resolved the issue. (Whereas using colnames(x) <- enc2native(colnames(x)) before the call to pander() didn't help).

GegznaV commented 5 years ago

I can also confirm that install_github("Rapporter/pander@06c2f65") solved my original issue (I use Windows 10 and R 3.5.2).

@daroczig, will this patch be merged into the main branch of pander? When can one expect it on CRAN?

And maybe #326 is not necessary?

mgruebsch commented 5 years ago

I had the same issue with German which is solved by devtools::install_github("Rapporter/pander@06c2f65"). Please merge the fix into the master release. Thank you!

dcomtois commented 5 years ago

@daroczig Do you plan on merging this issue? If not, pls let me know... I am holding off pushing an update of summarytools to CRAN (which will include translations) until the issue is resolved. Thx!

daroczig commented 5 years ago

Sorry for the delay, getting this done today.

GegznaV commented 5 years ago

@daroczig It seems that currently CRAN version of pander is inferior to the GitHub version. When is the GitHub version of pander (with this encoding bug fixed) going to be released on CRAN?

GegznaV commented 4 years ago

Is pander going to be updated on CRAN?

valentinaandrade commented 2 years ago

I had an issue trying to print a data frame with cyrillic column names:

Error in table.expand(t.colnames, t.width, justify, sep.col) : 
  basic_string::_S_create

Installing the patch mentionned in the above comment resolved the issue. (Whereas using colnames(x) <- enc2native(colnames(x)) before the call to pander() didn't help).

I've similar issue but the comment didn't resolved the new issue

Error in table.expand(x, t.width, justify, sep.col) : 
  basic_string::_M_create
daroczig commented 2 years ago

pander has been updated on CRAN on 2021-06-13, so the CRAN version should include this fix. If you see any similar problems, please open a new ticket with a minimal reproducible example.