Cupy.linalg.pinv very very Low performance

I encounter a problem when using Cupy.linalg.pinv, but this function really cost time. Compared with cpu use 10s to get result, gpu use 13s. The larger the matrix, the more the delay on gpu. But for function like Cupy.linalg.inv and others, speed in gpu is much faster than cpu's. Could you tell me the reason?

My facility: Ubuntu 16.04 server Cuda10.1 Cupy also follow cuda10.1 version.

Code is here:

import time import numpy as np import cupy as cp

a = np.random.randn(3000, 3000) a2 =cp.asarray(a,dtype=np.float64) start=time.time() a2 = cp.linalg.pinv(a2) end=time.time() print('gpu time',end-start)

start=time.time() b=np.linalg.pinv(a) end=time.time() print('cpu time',end-start)

cupy / cupy-performance

Cupy.linalg.pinv very very Low performance #5