brodieG / diffobj

Compare R Objects with a Diff
229 stars 12 forks source link

RStudio viewer encoding issue #115

Closed artemklevtsov closed 6 years ago

artemklevtsov commented 6 years ago

Hi.

Thank for this package. I faced with encoding issue when used with RStudio viewer.

diffobj::diffPrint("Hello", "He1lo", format = "html")

default

diffobj::diffPrint("Привет", "Превед", format = "html")

default


R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Arch Linux

Matrix products: default BLAS: /usr/lib/libblas.so.3.8.0 LAPACK: /usr/lib/liblapack.so.3.8.0

locale: [1] LC_CTYPE=ru_RU.UTF-8 LC_NUMERIC=C LC_TIME=ru_RU.UTF-8
[4] LC_COLLATE=C LC_MONETARY=ru_RU.UTF-8 LC_MESSAGES=ru_RU.UTF-8
[7] LC_PAPER=ru_RU.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_3.4.3 tools_3.4.3 parallel_3.4.3 rstudioapi_0.7 yaml_2.1.16 crayon_1.3.4 diffobj_0.1.9

brodieG commented 6 years ago

Thanks for reporting. I may not be able to look at this in the immediate future, but I will look at it.

In the meantime, some questions:

What happens if you just paste your string directly into the promp? I get the following in a C locale:

> "Привет"
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> xx <- "Привет"
> xx
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> Encoding(xx)
[1] "unknown"
> Encoding(xx) <- 'UTF-8'
> xx
[1] "<U+041F><U+0440><U+0438><U+0432><U+0435><U+0442>"

Looks like U+041F is the capital P in russian, so the string does seem encoded in UTF-8 yet is not displayed properly by RStudio.

Maybe this is a related issue.

Can you try running this through a normal terminal and see what you get there (both in non-html mode, as well as html mode)?

artemklevtsov commented 6 years ago

Terminal output is correct.

> diffobj::diffPrint("Hello", "He1lo", format = "raw")
< "Hello"      > "He1lo"    
@@ 1 @@        @@ 1 @@      
< [1] "Hello"  > [1] "He1lo"
> diffobj::diffPrint("Привет", "Превед", format = "raw")
< "Привет"      > "Превед"    
@@ 1 @@         @@ 1 @@       
< [1] "Привет"  > [1] "Превед"

Also work with the ansi formats.

I think the SO question related with Windows only issues.

Note RStudio viewer works correct with the Rmarkdown reports (on Russian).

artemklevtsov commented 6 years ago

htmltools::html_print also works.

htmltools::html_print("Привет")

default

brodieG commented 6 years ago

Thanks, that's useful.

brodieG commented 6 years ago

Actually, one thing you haven't shown, what does the Rstudio console do if you just hit enter after copy-pasting the string there as in my example where I got the "\370..." business?

artemklevtsov commented 6 years ago

Let's try with docker. The rocker/r-ver:3.4.3 image for example.

brodieG commented 6 years ago

I don't understand. You are able to produce the error in rocker/r-ver:3.4.3? All I was looking for was for you to paste the string in your Rstudio console in quotes and hit enter, and report back whether the string as interpreted by the Rstudio console looks normal to you or not. For example, this is what happens to me:

> "Привет"
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> xx <- "Привет"
> xx
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> Encoding(xx)
[1] "unknown"
> Encoding(xx) <- 'UTF-8'
> xx
[1] "<U+041F><U+0440><U+0438><U+0432><U+0435><U+0442>"

Notice this doesn't involve diffobj at all.

artemklevtsov commented 6 years ago

I think your input is not UTF-8 encoded.

> cat("\320\237\321\200\320\270\320\262\320\265\321\202")
Привет
> xx <- "Привет"
> Encoding(xx)
[1] "UTF-8"
> xx
[1] "Привет"
> cat("\320\237\321\200\320\270\320\262\320\265\321\202")
Привет
brodieG commented 6 years ago

It is properly encoded: "<U+041F>" is capital russian P, "<U+0440>" is lower case r, and so on. For whatever reason my Rstudio doesn't want to render them as the characters they are. Probably a locale issue on my side. Thanks for the additional info. I think this will be sufficient to figure out what's going on when I dig into it.

brodieG commented 6 years ago

I believe this is now fixed in the development branch:

Could you give it a whirl and see if it fixes your problem on your setup:

devtools::install_github('brodieg/diffobj@e824b481c94aac20d309dd31c0cb4ca7b17452ba')
artemklevtsov commented 6 years ago

Now it looks good.