High Performance R by Laurae
Please use Linux, because Windows CAN NOT BENEFIT WELL at all from performance gains using parallelization without compiling R with a Windows-native toolchain (not MinGW).
Requires Rcpp
(CRAN download) and parallel
(preinstalled) packages.
devtools::install_github("Laurae2/LauraeCpp", upgrade_dependencies = FALSE)
Dependencies:
install.packages("Rcpp")
Example of a parallel mean on a 10,000,000,000 length numeric vector (10 billion and 80GiB !!!).
Server:
-O3 -mtune=native
flagsCPU frequency:
Frequency | CPU loaded |
---|---|
3.7 GHz | 1, 2 |
3.5 GHz | 3, 4 |
3.4 GHz | 5, 6, 7, 8 |
3.1 GHz | 9, 10, 11, 12 |
2.8 GHz | 13, 14, 15, 16 |
2.8 GHz | 17-32 |
On a 80GiB (74.5GB) vector (10,000,000,000 elements), parallel mean:
> set.seed(1)
> my_vec <- runif(n = 1e10, min = -0.5, max = 1)
> object.size(my_vec)
80000000048 bytes
Repeats per number of threads from 1 to 64:
# Approx 4521.241 seconds total
> ceiling(20 * log(seq_len(64) + 1))
[1] 14 22 28 33 36 39 42 44
[9] 47 48 50 52 53 55 56 57
[17] 58 59 60 61 62 63 64 65
[25] 66 66 67 68 69 69 70 70
[33] 71 72 72 73 73 74 74 75
[41] 75 76 76 77 77 78 78 78
[49] 79 79 80 80 80 81 81 81
[57] 82 82 82 83 83 83 84 84
Results of the parallel mean:
What | Threads | Elapsed Time | CPU Time | Throughput | Speedup vs R | Information |
---|---|---|---|---|---|---|
R | 1 | 33.235s | 33.235s | 0.3 bn/s | 1x | Handles NA. |
C++ | 1 | 16.139s | 16.141s | 0.6 bn/s | 2.06x | No checks on data. |
C++ | 2 | 7.985s | 15.527s | 1.3 bn/s | 4.16x | No checks on data. |
C++ | 4 | 4.024s | 15.674s | 2.5 bn/s | 8.26x | No checks on data. |
C++ | 8 | 2.101s | 16.188s | 4.8 bn/s | 15.82x | No checks on data. |
C++ | 16 | 1.097s | 16.782s | 9.1 bn/s | 30.30x | No checks on data. |
C++ | 32 | 0.814s | 24.112s | 12.3 bn/s | 40.83x | No checks on data. |
C++ | 63 | 0.721s | 43.887s | 13.9 bn/s | 46.10x | No checks on data. Optimal run. |
C++ | 64 | 0.733s | 43.322s | 13.6 bn/s | 45.34x | No checks on data. |
Function | Parameters | Effect |
---|---|---|
meanLp | nthread | Relaxed parallel mean on integer or numeric vector |
sumLp | nthread | Relaxed parallel sum on integer or numeric vector |
addLp | nthread | Relaxed parallel A + B on integer or numeric vector |
subLp | nthread | Relaxed parallel A - B on integer or numeric vector |
mulLp | nthread | Relaxed parallel A * B on integer or numeric vector |
divLp | nthread | Relaxed parallel A / B on integer or numeric vector |
diffLp | lag, difference, nthread | Relaxed parallel lagged differences (base::diff ) on integer or numeric vector |
diffLp_simd | lag, difference, nthread | Relaxed SIMD parallel lagged differences (base::diff ) on integer or numeric vector |