GeoBosh / Rdpack

R package Rdpack provides functions and macros facilitating writing and management of R documentation.
https://geobosh.github.io/Rdpack/
28 stars 6 forks source link

Encoding issue on windows #29

Closed ManuelHentschel closed 1 year ago

ManuelHentschel commented 1 year ago

First off, thanks for this very useful package!

As stated in the title, I'm having a problem with the encoding of special characters on windows. I read the corresponding part of the readme, but did not manage to solve this issue without modifying the Rdpack package itself (or maybe by switching everything to native encoding, but I'd like to avoid that).

My setup is

If I try to cite bibliography entries containing special characters (German umlauts most of the time), they do not show up correctly in the output. A minimal example producing this issue for me would be for example:

\insertRef{DiaLop2020ejor}{Rdpack}

Below is a more detailed example .Rd illustrating this behavior, and the corresponding HTML produced when installing the package. It seems to me that the output of \Sexpr is always expected to have native encoding (i.e. latin1 on my machine) and e.g. \InsertRef produces strings that are UTF-8 encoded. Wrapping the corresponding R functions in enc2native seems to fix the issue.


Content of an .Rd file:

\name{someTest}
\alias{someTest}
\title{Encoding Test}

\section{Trying to write the umlaut oe:}{
    \describe{
        \item{Normal Rd:}{ö}
        \item{\code{\\Sexpr}:}{\Sexpr[results=rd,stage=build]{("ö")}}
        \item{Encoding in \code{\\Sexpr:}}{\Sexpr[results=rd,stage=build]{Encoding("ö")}}
        \item{\code{\\Sexpr} with \code{enc2utf8}}{\Sexpr[results=rd,stage=build]{enc2utf8("ö")}}
        \item{\code{\\Sexpr} with \code{enc2native}}{\Sexpr[results=rd,stage=build]{enc2native("ö")}}
    }
}

\section{Trying to cite something:}{
    \describe{
        \item{\code{\\insertRef}:}{\insertRef{DiaLop2020ejor}{Rdpack}}
        \item{\code{\\Sexpr}:}{\Sexpr[results=rd,stage=build]{Rdpack::insert_all_ref(t(c('DiaLop2020ejor', 'Rdpack')))}}
        \item{Encoding in \code{\\Sexpr}:}{\Sexpr[results=rd,stage=build]{Encoding(Rdpack::insert_all_ref(t(c('DiaLop2020ejor', 'Rdpack'))))}}
        \item{\code{\\Sexpr} with \code{enc2native}:}{\Sexpr[results=rd,stage=build]{enc2native(Rdpack::insert_all_ref(t(c('DiaLop2020ejor', 'Rdpack'))))}}
    }
}

Screenshot of the rendered help page: image


HTML generated by R CMD INSTALL --html .

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><title>R: Encoding Test</title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="R.css" />
</head><body>

<table width="100%" summary="page for someTest {SomePackage}"><tr><td>someTest {SomePackage}</td><td style="text-align: right;">R Documentation</td></tr></table>

<h2>Encoding Test</h2>

<h3>Trying to write the umlaut oe:</h3>

<dl>
<dt>Normal Rd:</dt><dd><p>ö</p>
</dd>
<dt><code>\Sexpr</code>:</dt><dd><p>ö</p>
</dd>
<dt>Encoding in <code>\Sexpr:</code></dt><dd><p>latin1</p>
</dd>
<dt><code>\Sexpr</code> with <code>enc2utf8</code></dt><dd><p>ö</p>
</dd>
<dt><code>\Sexpr</code> with <code>enc2native</code></dt><dd><p>ö</p>
</dd>
</dl>

<h3>Trying to cite something:</h3>

<dl>
<dt><code>\insertRef</code>:</dt><dd><p>Juan
Esteban Diaz, Manuel López-Ibáñez (2021).
&ldquo;Incorporating Decision-Maker's Preferences into the Automatic Configuration of Bi-Objective Optimisation Algorithms.&rdquo;
<em>European Journal of Operational Research</em>, <b>289</b>(3), 1209&ndash;1222.
doi: <a href="https://doi.org/10.1016/j.ejor.2020.07.059">10.1016/j.ejor.2020.07.059</a>.</p>
</dd>
<dt><code>\Sexpr</code>:</dt><dd><p>Juan
Esteban Diaz, Manuel López-Ibáñez (2021).
&ldquo;Incorporating Decision-Maker's Preferences into the Automatic Configuration of Bi-Objective Optimisation Algorithms.&rdquo;
<em>European Journal of Operational Research</em>, <b>289</b>(3), 1209&ndash;1222.
doi: <a href="https://doi.org/10.1016/j.ejor.2020.07.059">10.1016/j.ejor.2020.07.059</a>.</p>
</dd>
<dt>Encoding in <code>\Sexpr</code>:</dt><dd><p>UTF-8</p>
</dd>
<dt><code>\Sexpr</code> with <code>enc2native</code>:</dt><dd><p>Juan
Esteban Diaz, Manuel López-Ibáñez (2021).
&ldquo;Incorporating Decision-Maker's Preferences into the Automatic Configuration of Bi-Objective Optimisation Algorithms.&rdquo;
<em>European Journal of Operational Research</em>, <b>289</b>(3), 1209&ndash;1222.
doi: <a href="https://doi.org/10.1016/j.ejor.2020.07.059">10.1016/j.ejor.2020.07.059</a>.</p>
</dd>
</dl>

<hr /><div style="text-align: center;">[Package <em>SomePackage</em> version 5.5.110 <a href="00Index.html">Index</a>]</div>
</body></html>
GeoBosh commented 1 year ago

Thanks for the report, the detailed investigation, and examples. This was a long standing problem on Windows which should have disappeared with R >= 4.2-0. Could you try installing a more recent version of R (currently v4.2-2) and try with that?

Please let me know if you succeed. If you can supply a link to your bib file, I could investiggate myself, as well.

It is a long story and Tomas Kalibera from R-core has a number of posts and blogs about that but basically Windows was converting UTF-8 to the locale (code page) and then back. In the process, character that are not available in that locale were replaced by 'approximations' causing havoc for characters not in that locale. Windows now has proper UTF-8 locale and R-4.2 and later use that. As a consequence, \enc2native helps in some cases but is not a universal solution and causes its own problems since its success depends on the local encoding and on the particular characters involved.

ManuelHentschel commented 1 year ago

Thanks for the quick response!

Upgrading to 4.2.0 solved the problem, both in the example above and the original project.