Open bastistician opened 3 months ago
The nearby comments look relevant:
test(1590.03, forderv( c(x2,x1,x1,x2)), integer()) # desirable consistent result given identical(x1, x2)
# ^^ data.table consistent over time regardless of which version of R or locale
baseR = base::order(c(x2,x1,x1,x2))
# Even though C locale and identical(x1,x2), base R<=4.0.0 considers the encoding too; i.e. orders the encoding together x2 (UTF-8) before x1 (latin1).
# Then around May 2020, R-devel (but just on Windows) started either respecting identical() like data.table has always done, or put latin1 before UTF-8.
# Jan emailed R-devel on 23 May 2020.
# We relaxed 1590.04 and 1590.07 (tests of base R behaviour) rather than remove them, PR#4492 and its follow-up. But these two tests
# are so relaxed now that they barely testing anything. It appears base R behaviour is undefined in this rare case of identical strings in different encodings.
This will take some time to go through the history and figure out what this test was trying to do exactly and how to handle it.
Should we consider this a potential blocker for CRAN in the near future? We're just about to release a new version -- we can just deactivate those tests in the short term if needed.
The report shows that these two tests are not portable. If they were disabled I could drop the --no-tests
flag for data.table when mass-checking packages on Alpine Linux (against specific R patches).
(Only reporting now, seeing that data.table is being developed again.)
Checking released data.table 1.15.4, my Alpine Linux server gives
but at this point it is probably more useful to look at the development version of data.table.
So in a vanilla Alpine Linux container,
running
gives only 2 failures for test numbers 1590.05 and 1590.06:
Error in test.data.table()
``` * using R version 4.4.0 (2024-04-24) * using platform: x86_64-pc-linux-musl * R was compiled by gcc (Alpine 13.2.1_git20240309) 13.2.1 20240309 GNU Fortran (Alpine 13.2.1_git20240309) 13.2.1 20240309 * running under: Alpine Linux v3.20 * using session charset: UTF-8 [...] Running the tests in ‘tests/main.R’ failed. Complete output: > require(data.table) Loading required package: data.table > > test.data.table() # runs the main test suite of 5,000+ tests in /inst/tests/tests.Rraw getDTthreads(verbose=TRUE): OpenMP version (_OPENMP) 201511 omp_get_num_procs() 12 R_DATATABLE_NUM_PROCS_PERCENT unset (default 50) R_DATATABLE_NUM_THREADS unset R_DATATABLE_THROTTLE unset (default 1024) omp_get_thread_limit() 2147483647 omp_get_max_threads() 12 OMP_THREAD_LIMIT unset OMP_NUM_THREADS unset RestoreAfterFork true data.table is using 6 threads with throttle==1024. See ?setDTthreads. test.data.table() running: //data.table.Rcheck/data.table/tests/tests.Rraw Test 1590.05 ran without errors but failed check that x equals y: > x = x1 != x2 First 1 of 1 (type 'logical'): [1] FALSE > y = TRUE First 1 of 1 (type 'logical'): [1] TRUE 1 element mismatch Test 1590.06 ran without errors but failed check that x equals y: > x = forderv(c(x2, x1, x1, x2)) First 0 of 0 (type 'integer'): integer(0) > y = INT(1, 4, 2, 3) First 4 of 4 (type 'integer'): [1] 1 4 2 3 Numeric: lengths (0, 4) differ Unloading package bit64 Sat Aug 3 13:25:45 2024 endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ=='UTC', Sys.timezone()=='UTC', Sys.getlocale()=='C.UTF-8;C;C;C;C;C', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='OpenMP version (_OPENMP)==201511; omp_get_num_procs()==12; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==12; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 6 threads with throttle==1024. See ?setDTthreads.', .libPaths()=='//data.table.Rcheck','/usr/lib/R/library', zlibVersion()==1.3.1 ZLIB_VERSION==1.3.1 Error in test.data.table() : 2 error(s) out of 11369. Search tests/tests.Rraw for test number(s) 1590.05, 1590.06. Duration: 26.9s elapsed (29.1s cpu). ```Here is the relevant R code, with comments indicating results on Alpine Linux:
It seems this test (1590.05) relies on (undocumented) platform-dependent behaviour for invalid strings, so should probably be dropped.
I cannot say anything about the unexpected length-0 result of
data.table:::forderv(c(x2,x1,x1,x2))
(test number 1590.06).