Laurae2 / LauraeCpp

Parallel R for HPC on a Single Computer using C++
3 stars 0 forks source link
hpc openmp parallel r

LauraeRcpp

High Performance R by Laurae

Please use Linux, because Windows CAN NOT BENEFIT WELL at all from performance gains using parallelization without compiling R with a Windows-native toolchain (not MinGW).

me = wants download

Requires Rcpp (CRAN download) and parallel (preinstalled) packages.

devtools::install_github("Laurae2/LauraeCpp", upgrade_dependencies = FALSE)

Dependencies:

install.packages("Rcpp")

Performance Show

Example of a parallel mean on a 10,000,000,000 length numeric vector (10 billion and 80GiB !!!).

Server:

CPU frequency:

Frequency CPU loaded
3.7 GHz 1, 2
3.5 GHz 3, 4
3.4 GHz 5, 6, 7, 8
3.1 GHz 9, 10, 11, 12
2.8 GHz 13, 14, 15, 16
2.8 GHz 17-32

On a 80GiB (74.5GB) vector (10,000,000,000 elements), parallel mean:

> set.seed(1)
> my_vec <- runif(n = 1e10, min = -0.5, max = 1)
> object.size(my_vec)
80000000048 bytes

Repeats per number of threads from 1 to 64:

# Approx 4521.241 seconds total
> ceiling(20 * log(seq_len(64) + 1))
 [1] 14 22 28 33 36 39 42 44 
 [9] 47 48 50 52 53 55 56 57
[17] 58 59 60 61 62 63 64 65
[25] 66 66 67 68 69 69 70 70
[33] 71 72 72 73 73 74 74 75
[41] 75 76 76 77 77 78 78 78
[49] 79 79 80 80 80 81 81 81
[57] 82 82 82 83 83 83 84 84

Results of the parallel mean:

What Threads Elapsed Time CPU Time Throughput Speedup vs R Information
R 1 33.235s 33.235s 0.3 bn/s 1x Handles NA.
C++ 1 16.139s 16.141s 0.6 bn/s 2.06x No checks on data.
C++ 2 7.985s 15.527s 1.3 bn/s 4.16x No checks on data.
C++ 4 4.024s 15.674s 2.5 bn/s 8.26x No checks on data.
C++ 8 2.101s 16.188s 4.8 bn/s 15.82x No checks on data.
C++ 16 1.097s 16.782s 9.1 bn/s 30.30x No checks on data.
C++ 32 0.814s 24.112s 12.3 bn/s 40.83x No checks on data.
C++ 63 0.721s 43.887s 13.9 bn/s 46.10x No checks on data. Optimal run.
C++ 64 0.733s 43.322s 13.6 bn/s 45.34x No checks on data.

image

image

image

image

image

Functions included

Function Parameters Effect
meanLp nthread Relaxed parallel mean on integer or numeric vector
sumLp nthread Relaxed parallel sum on integer or numeric vector
addLp nthread Relaxed parallel A + B on integer or numeric vector
subLp nthread Relaxed parallel A - B on integer or numeric vector
mulLp nthread Relaxed parallel A * B on integer or numeric vector
divLp nthread Relaxed parallel A / B on integer or numeric vector
diffLp lag, difference, nthread Relaxed parallel lagged differences (base::diff) on integer or numeric vector
diffLp_simd lag, difference, nthread Relaxed SIMD parallel lagged differences (base::diff) on integer or numeric vector