Open TysonStanley opened 1 month ago
Can also be reproduced on amd64 Linux (although multiple other tests also break due to <U+????>
substitutions in conversions from UTF-8 to native encoding):
.libPaths(c('data.table.Rcheck', .libPaths()))
library(data.table)
trace(data.table:::endsWithAny, quote(if(identical(y, 'B')) browser())) # test 2194.7 compares with 'B'
test.data.table()
# same as data.table/inst/tests/issue_563_fread.txt'
Browse[1]> readLines(parent.frame(8)$env$testDir('issue_563_fread.txt'))
[1] "A,B"
Browse[1]> c
# later, at top level again
> readLines('inst/tests/issue_563_fread.txt')
[1] "A,B" "\304\205,\305\276" "\305\253,\304\257"
[4] "\305\263,\304\227" "\305\241,\304\231"
Rconn_fgetc
returns EOF
after the first line because it's set to decode from UTF-8 into the native encoding, and iconv()
fails to decode non-ASCII characters. This comes from file(encoding = getOption("encoding"))
, which is indeed set to UTF-8
by test.data.table
:
https://github.com/Rdatatable/data.table/blob/bb9faf65caf0ca366aa49c70b7dfb9e091108fe6/R/test.data.table.R#L92-L94
When giving a file path to readLines
, there's no way around it calling file()
with the default encoding=
, so tests.Rraw
will have to either manually open the file with a different encoding (in which the contents will be invalid!) or construct a different string to endsWithAny
. In particular, ?file
recomments creating an unopened connection marked as UTF-8 (file(open = '', encoding = 'UTF-8')
) and giving it to readLines
in order to read UTF-8 in an R session incapable of representing UTF-8 natively:
# context: options(encoding = 'UTF-8'), LC_ALL=C
con <- file('inst/tests/issue_563_fread.txt', open = '')
readLines(con)
# [1] "A,B" "<U+0105>,<U+017E>" "<U+016B>,<U+012F>" "<U+0173>,<U+0117>"
# [5] "<U+0161>,<U+0119>"
close(con)
Unfortunately, readLines
won't do it by itself: it uses file(open='r')
which initialises UTF-8 → ASCII conversion and breaks.
Something for after the patch release found in the release process (but don't believe it should stop the current patch release):
This makes
test.data.table()
fail on MacOS Apple Silicon on test2194.7