IBMSparkGPU / GPUEnabler

Provides GPU awareness to Spark, Contact: @kmadhugit and @kiszk
Apache License 2.0
172 stars 59 forks source link

two gpu run example SparkGPULR: GPU's performance lower than CPU #37

Open xuxilei opened 7 years ago

xuxilei commented 7 years ago

nvidia GeForce GT 720 AND nvidia Quadro K2200

nvidia GeForce GT 720: GPU: 915 ms CPU: 129 ms

nvidia Quadro K2200: GPU: 1604 ms CPU: 233 ms

nvidia GeForce GT 720: [root@xxl GPUEnabler-master]# bin/run-example SparkGPULR

Executing : mvn -q scala:run -DmainClass=com.ibm.gpuenabler.SparkGPULR -DaddArgs="local[*]"

WARN: This is a naive implementation of Logistic Regression and is given as an example!

Please use either org.apache.spark.mllib.classification.LogisticRegressionWithSGD or

org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS

for more conventional use.

Data generation done

numSlices=2, N=10000, D=10, ITERATIONS=5

GPU iteration 1

GPU iteration 2

GPU iteration 3

GPU iteration 4

GPU iteration 5

Elapsed GPU time: 915 ms

2032.3421896242655, 2466.7709148517497, 3177.329166002957, 3332.3071606036865, 2112.989604686327, .... 1438.4704410276909, 3008.249715655393, 3295.4068182006863, 2684.441334404038, 1807.6596785250006,

===================================

CPU iteration 1

CPU iteration 2

CPU iteration 3

CPU iteration 4

CPU iteration 5

Elapsed CPU time: 129 ms

2032.3421896242687, 2466.7709148517433, 3177.3291660029563, 3332.307160603694, 2112.989604686332, .... 1438.4704410276915, 3008.24971565539, 3295.406818200689, 2684.441334404038, 1807.6596785249974,

nvidia Quadro K2200: [root@localhost GPUEnabler-master]# bin/run-example SparkGPULR

Executing : mvn -q scala:run -DmainClass=com.ibm.gpuenabler.SparkGPULR -DaddArgs="local[*]"

WARN: This is a naive implementation of Logistic Regression and is given as an example!

Please use either org.apache.spark.mllib.classification.LogisticRegressionWithSGD or

org.apache.spark.mllib.classification.LogisticRegressionWithLBFGS

for more conventional use.

Data generation done

numSlices=2, N=10000, D=10, ITERATIONS=5

GPU iteration 1

GPU iteration 2

GPU iteration 3

GPU iteration 4

GPU iteration 5

Elapsed GPU time: 1604 ms

2032.3421896242655, 2466.770914851749, 3177.3291660029586, 3332.307160603685, 2112.9896046863255, .... 1438.470441027691, 3008.2497156553923, 3295.406818200685, 2684.441334404037, 1807.6596785250001,

===================================

CPU iteration 1

CPU iteration 2

CPU iteration 3

CPU iteration 4

CPU iteration 5

Elapsed CPU time: 233 ms

2032.3421896242687, 2466.7709148517433, 3177.3291660029563, 3332.307160603694, 2112.989604686332, .... 1438.4704410276915, 3008.24971565539, 3295.406818200689, 2684.441334404038, 1807.6596785249974,

kiszk commented 7 years ago

In general, GPU is faster than CPU when the workload is computation heavy. According to our experience, this application becomes computation heavy when D is more than 100 at least. Could you please try with larger D (e.g. 100, 200, or 300)?

xuxilei commented 7 years ago

GeForce GT 720:

Data generation done

numSlices=2, N=100000, D=300, ITERATIONS=5

GPU iteration 1

GPU iteration 2

GPU iteration 3

GPU iteration 4

GPU iteration 5

Elapsed GPU time: 3752 ms

6771.944142666795, 6844.390807667099, 8484.036733305305, 8690.504549869778, 6971.1851421342935, .... 8486.68043714227, 7212.865818042646, 6664.541750389776, 9407.3710441712, 6464.48823824653,

===================================

CPU iteration 1

CPU iteration 2

CPU iteration 3

CPU iteration 4

CPU iteration 5

Elapsed CPU time: 2391 ms

6771.944142666805, 6844.390807667071, 8484.036733305255, 8690.504549869807, 6971.185142134303, .... 8488.587373098151, 7212.9796246771475, 6666.298441347763, 9407.509816535294, 6465.723414347034,