arrayfire / arrayfire-python

Python bindings for ArrayFire: A general purpose GPU library.
https://arrayfire.com
BSD 3-Clause "New" or "Revised" License
416 stars 65 forks source link

Slow SVD #134

Open GongYiLiao opened 7 years ago

GongYiLiao commented 7 years ago

I found AF's SVD implementation is quite slow comparing to DGEMM with Radeon HD 7950/FGLRX driver on Debian Jessie:

In [47]: from pylab import randn, svd

In [48]: x_0 = randn(1000, 1000)

In [49]: %time y_0 = svd(x_0)
CPU times: user 1.24 s, sys: 1.01 s, total: 2.24 s
Wall time: 287 ms

In [50]: x_1 = af.Array(x_0.ctypes.data, x_0.shape, 'd')

In [51]: %time y_1 = af.svd(x_1)
CPU times: user 3.64 s, sys: 3.97 s, total: 7.62 s
Wall time: 3.25 s

AF's SVD takes more than 9 times of Numpy's SVD to solve the same matrix, However, the in DGEMM, AF is faster (but not much) than Numpy:

In [75]: from pylab import dot

In [76]: %time z_0 = dot(x_0.transpose(), x_0)
CPU times: user 52 ms, sys: 20 ms, total: 72 ms
Wall time: 10.6 ms

In [77]: %time z_1 = af.matmul(x_1.T, x_1)
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 8.38 ms

I am wondering if there are anything I should tune/adjust before proceeding.

pavanky commented 7 years ago

@GongYiLiao can you show the out put of af.info() ?

GongYiLiao commented 7 years ago
In [91]: af.info()
ArrayFire v3.3.2 (OpenCL, 64-bit Linux, build default)
[0] AMD     : Tahiti, 3035 MB -- OpenCL 1.2 AMD-APP (1912.5) -- Device driver 1912.5 (VM) -- FP64 Support: True -- Unified Memory (False)
-1- AMD     : Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 31865 MB -- OpenCL 1.2 AMD-APP (1912.5) -- Device driver 1912.5 (sse2,avx) -- FP64 Support: True -- Unified Memory (True)
gul916 commented 7 years ago

Hello,

I can confirm that svd is abnormally slow under python-arrayfire as compared to scipy. I have benchmarked it using the attached file test_svd_af_gul.py, similar to bench_fft.py. Results are presented in arrayfire-test_svd_gul.txt, showing a 10 TIMES SPEED DECREASE with cuda backend as compared to cpu backend or scipy. Moreover opencl backend is not working, despite I can run bench_blas with it, for instance. What is strange is that GPU is almost not used with cuda backend.

I choose scipy rather than numpy because there is a bug with svd in single precision (https://github.com/numpy/numpy/issues/9516).

Thanking you, GuL916

test_svd_af_gul.py.txt arrayfire-test_svd_gul.txt