getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.06k stars 64 forks source link

[rkeops] Setting the number of cores when using CPU backend does not seem to work #383

Open astamm opened 2 months ago

astamm commented 2 months ago

Hi, As the title of the issue explains, a small reprex for this (Hausdorff distance computation on functional data set).

Generate some data

library(purrr)
library(roahd)
library(withr)

N <- 100
P <- 200
L <- 3
time_grid <- seq(0, 1, length.out = P)
C1 <- exp_cov_function(time_grid, alpha = 0.1, beta = 0.2)
C2 <- exp_cov_function(time_grid, alpha = 0.2, beta = 0.5)
C3 <- exp_cov_function(time_grid, alpha = 0.3, beta = 1)
centerline <- matrix(c(
  sin(2 * pi * time_grid),
  sqrt(time_grid),
  10 * (time_grid - 0.5) * time_grid),
  nrow = 3, byrow = TRUE
)

withr::with_seed(1234, {
  dat <- generate_gauss_mfdata(
    N, L, centerline,
    correlations = c(0.5, 0.5, 0.5),
    listCov = list(C1, C2, C3)
  )
})

dat <- purrr::map(1:N, \(n) {
  rbind(dat[[1]][n, ], dat[[2]][n, ], dat[[3]][n, ])
})
dat <- dat[1:21]

Implement dist() function for Hausdorff distance via {rkeops}

library(rkeops)
reticulate::use_virtualenv(virtualenv = "rkeops", required = TRUE)
check_rkeops()

hausdorff_distance_rkeops <- function(x, y) {
  x_i <- Vi(x)
  y_j <- Vj(y)
  D_ij <- norm2(x_i - y_j)
  max(max(min(D_ij, "i")), max(min(D_ij, "j")))
}

hausdorff_distance_rkeops(t(dat[[1]]), t(dat[[2]]))

dist_rkeops <- function(x) {
  x <- purrr::map(x, t)
  N <- length(x)
  K <- N * (N - 1) / 2
  out <- purrr::map_dbl(1:K, \(k) {
    k <- k - 1
    i <- N - 2 - floor(sqrt(-8 * k + 4 * N * (N - 1) - 7) / 2.0 - 0.5);
    j <- k + i + 1 - N * (N - 1) / 2 + (N - i) * ((N - i) - 1) / 2;
    i <- i + 1
    j <- j + 1
    hausdorff_distance_rkeops(x[[i]], x[[j]])
  })

  attributes(out) <- NULL
  attr(out, "Size") <- N
  lbls <- names(x)
  attr(out, "Labels") <- if (is.null(lbls)) 1:N else lbls
  attr(out, "Diag") <- FALSE
  attr(out, "Upper") <- FALSE
  attr(out, "method") <- "hausdorff"
  class(out) <- "dist"
  out
}

Benchmarks

rkeops_use_cpu(ncore = 4L)
bench::mark(
  D1 <- dist_rkeops(dat), iterations = 10L
)
# A tibble: 1 × 13
  expression            min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
  <bch:expr>          <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
1 D1 <- dist_rkeops(… 1.33s  1.39s     0.701    7.05MB     4.56    10    65      14.3s <dist>
rkeops_use_cpu(ncore = 1L)
bench::mark(
  D2 <- dist_rkeops(dat), iterations = 10L
)
# A tibble: 1 × 13
  expression            min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result
  <bch:expr>          <bch> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list>
1 D2 <- dist_rkeops(… 1.43s  1.54s     0.641    6.82MB     4.23    10    66      15.6s <dist>
astamm commented 2 months ago

Benchmarking on the whole set of 100 curves leads to equivalent median times (35s) and memory allocation (159MB).

astamm commented 2 months ago

I see that setting the number of cores is handled by {RhpcBLASctl}. This package has the following warning:

openmp-related functions only work at build-time instructions. Omp_get * now returns NA_Integer if R was not built with openmp instructions. NOTE: if R itself is built without OpenMP instructions and BLAS thread is used with OpenMP, this package will not work...XD

In fact, if I run locally RhpcBLASctl::omp_get_num_procs(), it returns NA because R is built by default without OpenMP on macOS.

I will try to get a build with OpenMP to understand if this issue persists. However, it could be good to mention that CPU parallelisation does not work with default R builds on macOS.

astamm commented 2 months ago

Instructions to (re-)enable OpenMP support with macOS clang compiler can be found here: https://mac.r-project.org/openmp/. Once done, re-installing from source RhpcBLASctl make it discover OpenMP and correctly find the number of cores.

It is trickier with rkeops because it is a wrapper that behind the scene uses the pykeops Python package. Hence, on macOS, one also needs to somehow tell the python installer where to find OpenMP libs and headers to properly build pykeops with OpenMP support.

astamm commented 2 months ago

The best way to handle this would be to provide a configure.ac file which would generate a configure file via a call to autoconf in which we can generate different Makevars files depending on the detected OS (see https://www.r-bloggers.com/2017/08/setting-up-optional-openmp-support-with-rcpparmadillo/ for example).