Closed ben-schwen closed 2 weeks ago
Bare variable names (symbols) are required to be in the native encoding. On systems incapable of representing ñ in the native encoding (LC_ALL=C
, or, e.g., KOI8-R), there is no way to preserve an ñ in a variable name.
On non-UTF-8 systems that can represent ñ in the native encoding, the code will work fine:
$ LC_ALL=en_GB.ISO-8859-15 luit R -q -s -e 'as.name("\uf1"); parse(text = "DT[, .N, a\U00F1o]$N[1L]")'
ñ
expression(DT[, .N, año]$N[1L])
If there is no ñ in the current locale, translateChar()
internally called by parse()
substitutes some text and you get a syntax error, but iconv
seems to help:
# this works
LC_ALL=en_GB.ISO-8859-15 luit R -q -s -e 'text <- iconv("DT[, .N, a\U00F1o]$N[1L]", "UTF-8", ""); if (!is.na(text)) parse(text = text)'
# expression(DT[, .N, año]$N[1L])
# this doesn't crash
LC_ALL=C R -q -s -e 'text <- iconv("DT[, .N, a\U00F1o]$N[1L]", "UTF-8", ""); if (!is.na(text)) parse(text = text)'
Thanks @aitap. What's luit
?
iconv()
looks as good a solution as any -- definitely good to still run those tests on non-UTF-8 systems, rather than just skip if parsing fails.
luit converts between the UTF-8 terminal session and the non-UTF-8 encoding used by its child process.
iconv()
to return NA
if conversion fails: on FreeBSD we instead get a?o
.
Test of #4711 does not work in systems without
UTF-8
encoding as e.g. ourtest-lin-rel-vanilla
container.Output of spinning up a new container with the image
registry.gitlab.com/jangorecki/dockerfiles/r-base-gcc