NumPower / numpower

PHP extension for efficient scientific computing and array manipulation with GPU support
https://numpower.org
Other
192 stars 4 forks source link

GPU matmul refactoring and optimization #53

Closed SkibidiProduction closed 4 months ago

SkibidiProduction commented 4 months ago

Overview

Firstly, by the time this code is executed, based on the NDArray_Matmul method, we know that both arrays are on the same device.

Based on the fact that we know that array "a" is on the GPU, we can say that both arrays are on the GPU. Therefore, the preprocessor directive to check for the presence of CUBLAS does not make sense, since placing the array in GPU memory is not possible without the presence of CUBLAS. Therefore this directive has been removed.

Secondly, based on the first point, we know that both arrays are already placed in GPU memory, therefore there is no need to allocate additional memory and copy them. Therefore, the cudaMalloc, cudaMemcpy and cudaFree functions for input arrays have been removed.

The name of the resulting array has been changed from d_C to deviceResult to make the code more clear.

These changes led to an increase in the performance of the matmul operation.


Benchmark before changes:

NDArray

Measurement Value
Number of measurements 100
Mean 0.84835189755758
Standard Deviation 0.01839222232213

Benchmark after changes:

NDArray

Measurement Value
Number of measurements 100
Mean 0.43721009731293
Standard Deviation 0.026118019744953

Pytorch (for comparison)

Measurement Value
Number of measurements 100
Mean 0.4293128824234
Standard Deviation 0.03115384458902

Visualization of the multiplication rate as the number of iterations increases for NDArray and Pytorch.

line-graph (1)

Note: after the 50th iteration, the speed started to drop for both libraries. These performance changes correlate with the graphics card heating up.

Test stand # Value
OS Ubuntu 20.04
PHP version 8.3.0
NumPower version 0.5.1
Python version 3.11.5
Pytorch version 2.2.0
CUDA version 11.6.2
NVIDIA driver 550.90
GPU GeForce GTX 980M 4Gb
Matrix shape 8192x8192
henrique-borba commented 4 months ago

I made some changes while testing: