Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.58k stars 977 forks source link

WISH/BUG: Respect CPU resource limitations set by Linux CGroups to avoid CPU overuse and slowdown #5620

Open HenrikBengtsson opened 1 year ago

HenrikBengtsson commented 1 year ago

Issue

data.table::getDTthreads() is not agile to Linux CGroups settings. If CGroups limits the number of CPU cores, then data.table will overuse the CPU resources is available to the R process.

For example, the 'Free' Posit Cloud plan gives you a single CPU core to play with. They use CGroups v1 to limit the CPU resource. Running the following from within their RStudio server reveals this:

> total <- as.integer(readLines("/sys/fs/cgroup/cpu/cpu.cfs_period_us"))
> total
[1] 100000
> quota <- as.integer(readLines("/sys/fs/cgroup/cpu/cpu.cfs_quota_us"))
> quota
[1] 100000
> cores <- quota / total
> cores
[1] 1

A user on the 'Premium' plan has 4 CPUs to play with, so they would get quota = 400000 and cores = 4 above.

The defaults of data.table does not pick this up:

> data.table::getDTthreads(verbose = TRUE)
  OpenMP version (_OPENMP)       201511
  omp_get_num_procs()            16
  R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
  R_DATATABLE_NUM_THREADS        unset
  R_DATATABLE_THROTTLE           unset (default 1024)
  omp_get_thread_limit()         2147483647
  omp_get_max_threads()          16
  OMP_THREAD_LIMIT               unset
  OMP_NUM_THREADS                unset
  RestoreAfterFork               true
  data.table is using 8 threads with throttle==1024. See ?setDTthreads.
[1] 8

This means multi-threaded data.table tasks will overuse the CPU resources by 800%, which results in lots of overhead from context switching (unless there are other low-level mechanisms in data.table detecting this). CPU overuse will slow down the performance.

The overuse problem becomes worse the more CPU cores the host has. For example, the Posit Cloud instances currently runs with 16 vCPUs, but if they upgrade to say 64 vCPUs, the overuse will be 3200%. On research HPC environments, it's now common to see 192 CPUs, and I'd expect this number to grow over time.

FWIW, parallelly::availableCores() queries also CGroups/CGroups v2, e.g.

> parallelly:::availableCores()
cgroups.cpuquota 
               1 

> parallelly:::availableCores(which = "all")
          system   cgroups.cpuset cgroups.cpuquota            nproc 
              16               16                1               16 

Session info

> sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.8

loaded via a namespace (and not attached):
[1] compiler_4.2.3 tools_4.2.3   
tdhock commented 1 year ago

similar to #5573 about using data.table on slurm cluster. currently we assume this kind of configuration should be handled by the user. For example, the user can set R_DATATABLE_NUM_THREADS environment variable. in terms of dev/maintenance time, how many types of environment variables like this should we support? (SLURM, CGroups, ...?) how would we test each of them? given constraints on dev time, I would argue that it would be better to keep asking users to handle this.

HenrikBengtsson commented 1 year ago

it would be better to keep asking users to handle this

Given that data.table is such a central infrastructure package used internally by many packages and pipelines, I wonder how many users even know they are using data.table yet know they need to configure the number threads it should use.

For the problem reported here, CGroups throttling, I believe there are lots of data.table instances out there running slower than a single-thread version would do, and this without anyone even noticing the problem. It's only the savvy user who would know that this could be a problem and that it should be fixed.