Closed artemklevtsov closed 6 years ago
Thanks for reporting. I may not be able to look at this in the immediate future, but I will look at it.
In the meantime, some questions:
What happens if you just paste your string directly into the promp? I get the following in a C locale:
> "Привет"
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> xx <- "Привет"
> xx
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> Encoding(xx)
[1] "unknown"
> Encoding(xx) <- 'UTF-8'
> xx
[1] "<U+041F><U+0440><U+0438><U+0432><U+0435><U+0442>"
Looks like U+041F is the capital P in russian, so the string does seem encoded in UTF-8 yet is not displayed properly by RStudio.
Maybe this is a related issue.
Can you try running this through a normal terminal and see what you get there (both in non-html mode, as well as html mode)?
Terminal output is correct.
> diffobj::diffPrint("Hello", "He1lo", format = "raw")
< "Hello" > "He1lo"
@@ 1 @@ @@ 1 @@
< [1] "Hello" > [1] "He1lo"
> diffobj::diffPrint("Привет", "Превед", format = "raw")
< "Привет" > "Превед"
@@ 1 @@ @@ 1 @@
< [1] "Привет" > [1] "Превед"
Also work with the ansi
formats.
I think the SO question related with Windows only issues.
Note RStudio viewer works correct with the Rmarkdown reports (on Russian).
htmltools::html_print
also works.
htmltools::html_print("Привет")
Thanks, that's useful.
Actually, one thing you haven't shown, what does the Rstudio console do if you just hit enter after copy-pasting the string there as in my example where I got the "\370..." business?
Let's try with docker. The rocker/r-ver:3.4.3
image for example.
I don't understand. You are able to produce the error in rocker/r-ver:3.4.3? All I was looking for was for you to paste the string in your Rstudio console in quotes and hit enter, and report back whether the string as interpreted by the Rstudio console looks normal to you or not. For example, this is what happens to me:
> "Привет"
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> xx <- "Привет"
> xx
[1] "\320\237\321\200\320\270\320\262\320\265\321\202"
> Encoding(xx)
[1] "unknown"
> Encoding(xx) <- 'UTF-8'
> xx
[1] "<U+041F><U+0440><U+0438><U+0432><U+0435><U+0442>"
Notice this doesn't involve diffobj
at all.
I think your input is not UTF-8 encoded.
> cat("\320\237\321\200\320\270\320\262\320\265\321\202")
Привет
> xx <- "Привет"
> Encoding(xx)
[1] "UTF-8"
> xx
[1] "Привет"
> cat("\320\237\321\200\320\270\320\262\320\265\321\202")
Привет
It is properly encoded: "<U+041F>" is capital russian P, "<U+0440>" is lower case r, and so on. For whatever reason my Rstudio doesn't want to render them as the characters they are. Probably a locale issue on my side. Thanks for the additional info. I think this will be sufficient to figure out what's going on when I dig into it.
I believe this is now fixed in the development branch:
Could you give it a whirl and see if it fixes your problem on your setup:
devtools::install_github('brodieg/diffobj@e824b481c94aac20d309dd31c0cb4ca7b17452ba')
Now it looks good.
Hi.
Thank for this package. I faced with encoding issue when used with RStudio viewer.
R version 3.4.3 (2017-11-30) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Arch Linux
Matrix products: default BLAS: /usr/lib/libblas.so.3.8.0 LAPACK: /usr/lib/liblapack.so.3.8.0
locale: [1] LC_CTYPE=ru_RU.UTF-8 LC_NUMERIC=C LC_TIME=ru_RU.UTF-8
[4] LC_COLLATE=C LC_MONETARY=ru_RU.UTF-8 LC_MESSAGES=ru_RU.UTF-8
[7] LC_PAPER=ru_RU.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached): [1] compiler_3.4.3 tools_3.4.3 parallel_3.4.3 rstudioapi_0.7 yaml_2.1.16 crayon_1.3.4 diffobj_0.1.9