Laurae2 / LauraeCpp

Parallel R for HPC on a Single Computer using C++
3 stars 0 forks source link

Performance Examples #1

Open Laurae2 opened 5 years ago

Laurae2 commented 5 years ago

OLD 2^31-1 BN

Server:

CPU frequency:

Frequency CPU loaded
3.7 GHz 1, 2
3.5 GHz 3, 4
3.4 GHz 5, 6, 7, 8
3.1 GHz 9, 10, 11, 12
2.8 GHz 13, 14, 15, 16
2.8 GHz 17-32

On a 16GB vector (2^31 - 1 elements), parallel mean:

What Threads Elapsed Time CPU Time Throughput Information
R 1 6.137s 6.141s 0.3 bn/s Handles NA. Handles more than 2^31 - 1 elements.
C++ 1 3.147s 3.147s 0.7 bn/s No checks on data.
C++ 2 1.613s 3.206s 1.3 bn/s No checks on data.
C++ 4 0.832s 3.304s 2.6 bn/s No checks on data.
C++ 8 0.432s 3.432s 5.0 bn/s No checks on data.
C++ 16 0.229s 3.664s 9.4 bn/s No checks on data.
C++ 32 0.172s 5.241s 12.4 bn/s No checks on data.
C++ 61 0.152s 9.102s 14.0 bn/s No checks on data. Optimal run.
C++ 64 0.165s 9.791s 13.0 bn/s No checks on data.

image

image

image

image

image

Laurae2 commented 5 years ago

NEW 10 BILLION

Server:

CPU frequency:

Frequency CPU loaded
3.7 GHz 1, 2
3.5 GHz 3, 4
3.4 GHz 5, 6, 7, 8
3.1 GHz 9, 10, 11, 12
2.8 GHz 13, 14, 15, 16
2.8 GHz 17-32

On a 80GiB (74.5GB) vector (10,000,000,000 elements), parallel mean:

> set.seed(1)
> my_vec <- runif(n = 1e10, min = -0.5, max = 1)
> object.size(my_vec)
80000000048 bytes

Repeats per number of threads from 1 to 64:

# Approx 4521.241 seconds total
> ceiling(20 * log(seq_len(64) + 1))
 [1] 14 22 28 33 36 39 42 44 
 [9] 47 48 50 52 53 55 56 57
[17] 58 59 60 61 62 63 64 65
[25] 66 66 67 68 69 69 70 70
[33] 71 72 72 73 73 74 74 75
[41] 75 76 76 77 77 78 78 78
[49] 79 79 80 80 80 81 81 81
[57] 82 82 82 83 83 83 84 84

Results of the parallel mean:

What Threads Elapsed Time CPU Time Throughput Speedup vs R Information
R 1 33.235s 33.235s 0.3 bn/s 1x Handles NA.
C++ 1 16.139s 16.141s 0.6 bn/s 2.06x No checks on data.
C++ 2 7.985s 15.527s 1.3 bn/s 4.16x No checks on data.
C++ 4 4.024s 15.674s 2.5 bn/s 8.26x No checks on data.
C++ 8 2.101s 16.188s 4.8 bn/s 15.82x No checks on data.
C++ 16 1.097s 16.782s 9.1 bn/s 30.30x No checks on data.
C++ 32 0.814s 24.112s 12.3 bn/s 40.83x No checks on data.
C++ 63 0.721s 43.887s 13.9 bn/s 46.10x No checks on data. Optimal run.
C++ 64 0.733s 43.322s 13.6 bn/s 45.34x No checks on data.

image

image

image

image

image