Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.6k stars 982 forks source link

encoding test 2194.7 not portable #5484

Open jangorecki opened 2 years ago

jangorecki commented 2 years ago

I am able to reproduce on clean debian:testing without setting up any locale.

image of debian + r-base + gcc + make:

sudo docker run -it --rm registry.gitlab.com/jangorecki/dockerfiles/r-base-gcc
  Sat Nov 25 10:50:59 2023  endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ=='UTC', Sys.timezone()=='UTC', Sys.getlocale()=='C', l10n_info()=='MBCS=FALSE; UTF-8=FALSE; Latin-1=FALSE; codeset=ANSI_X3.4-1968', getDTthreads()=='This installation of data.table has not been compiled with OpenMP support.; omp_get_num_procs()==1; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==1; omp_get_max_threads()==1; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 1 threads with throttle==1024. See ?setDTthreads.', zlib header files were not found when data.table was compiled
  Error: 1 error(s) out of 9985. Search tests/tests.Rraw for test number(s) 2194.7. Duration: 21.7s elapsed (21.5s cpu).
  In addition: Warning message:
  In readLines(testDir("issue_563_fread.txt")) :
    invalid input found on input connection '/builds/Rdatatable/data.table/bus/test-rel-vanilla-lin/data.table.Rcheck/data.table/tests/issue_563_fread.txt'

This is the only error there, and AFAIK we have many encoding related tests... so not sure if it is expected to install and configure locale just to pass this single test, or maybe improve test somehow. @shrektan any idea?

MichaelChirico commented 6 months ago

Tests pass fine on the r-devel-gcc codespace

getDTthreads(verbose=TRUE):
  OpenMP version (_OPENMP)       201511
  omp_get_num_procs()            2
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS        unset
  R_DATATABLE_THROTTLE           unset (default 1024)
  omp_get_thread_limit()         2147483647
  omp_get_max_threads()          2
  OMP_THREAD_LIMIT               unset
  OMP_NUM_THREADS                unset
  RestoreAfterFork               true
  data.table is using 1 threads with throttle==1024. See ?setDTthreads.
test.data.table() running: /usr/local/lib/R/library/data.table/tests/tests.Rraw
# ...
Sat Apr  6 00:03:16 2024  endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ==unset, Sys.timezone()=='Etc/UTC', Sys.getlocale()=='LC_CTYPE=C.UTF-8;LC_NUMERIC=C;LC_TIME=C.UTF-8;LC_COLLATE=C.UTF-8;LC_MONETARY=C.UTF-8;LC_MESSAGES=C.UTF-8;LC_PAPER=C.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C.UTF-8;LC_IDENTIFICATION=C', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='OpenMP version (_OPENMP)==201511; omp_get_num_procs()==2; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==2; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 1 threads with throttle==1024. See ?setDTthreads.', zlibVersion()==1.3 ZLIB_VERSION==1.3
10 longest running tests took 13s (45% of 29s)
      ID  time nTest
 1: 1438 2.134   738
 2: 1648 1.668    91
 3: 1223 1.616   728
 4: 2155 1.494     5
 5: 1652 1.362    91
 6: 1650 1.352    91
 7: 1437 1.226    36
 8: 1644 0.953    91
 9: 1252 0.951   484
10: 1642 0.910    91
All 11151 tests (last 2253.19) in tests/tests.Rraw completed ok in 34.0s elapsed (31.9s cpu)

Are you still able to reproduce this?

jangorecki commented 6 months ago

Does r-devel-gcc has same locale as debian:testing? Afair it was not reproducible on r-devel-gcc before because it sets locale. Vanilla test job fails because of that. It looks that since recent commits there are more errors related to encoding on this job https://rdatatable.gitlab.io/data.table/web/checks/data.table/test-lin-rel-vanilla/00check.log