Closed paolodalberto closed 1 year ago
At this point I do not really fancy performance but to have a clear understanding how a simple interface could be made. Eventually I am planning to create kernels codes .. but this is a necessary first step. We can discuss internally if you like.
Thank you for your time
I have found another corner case for gemv
(Pdb) n
BLAS GEMV 0x239dcf0
0 Device: AMD Radeon VII
A 0 0
B -52.9725 -34.4124
C 0.5 23.375
Data and Initialization Kernel 0.223526
m, n = 72, 64, 64, alpha=1, beta=1
4 Error: rocblas error in line 61
invalid size parameter.
Time Kernel 1e-05
C <- 0.5 23.375
Read data from Kernel 0.000573
0.5
This time I do some parameters wrong ... which one ?
The matrices Layout for rocblas are Fortran like I fixed the problem for above for GEMV ... checking what I am doing wrong for GEMM
V = rocmgpu.gemm(0,LL.A.flatten('F'), LL.shape[0],RR.A.flatten(), RR.shape[1],Result.A.flatten(),Result.shape[1])
This will work :)
nice @rkamd I think I have found out what I am doing wrong :)
I will share the code and consideration tomorrow and I will close the issue. Paolo
@paolodalberto, sounds good.
@paolodalberto just to clarify in rocBLAS all matrices are column major memory ordered (unfortunately this is python value 'F' and 'C' was used for row major as they are language based abbreviations). See https://rocmdocs.amd.com/projects/rocBLAS/en/develop/API_Reference_Guide.html in the Note section. You can create your numpy array with this layout https://numpy.org/doc/stable/reference/generated/numpy.array.html as the constructor and other methods like reshape may take the argument order='F' to use this layout. It will be good to document this in your python interface as well.
@TorreZuk yep it took a little to pivot but I think I managed :) @rkamd I will close this as reference my basic interface looks like this https://github.com/paolodalberto/MatrixFlow/blob/main/GpuInterface/gpuinterface.cpp https://github.com/paolodalberto/MatrixFlow/blob/main/GpuInterface/procm.py
I have now a simple toy for sparse and dense, GPU and not GPU for GEMV and GPU and not GPU for GEMM
cheers
please contact me directly if you have any further questions
Describe the bug
I created a simple python interface. It works using small example but not in an application. I will provide example and code to reproduce the problem and I think it is a sync problem and I need your help.
To Reproduce
I used a docker tensorflow:latest using rocm 5.6
I installed the clients using install.sh to get the clients code and learn how to write a simple interface.
https://github.com/paolodalberto/MatrixFlow/tree/main/GpuInterface
in particular a simple gemm wrapper https://github.com/paolodalberto/MatrixFlow/blob/main/GpuInterface/gpuinterface.cpp#L425
where I sample the inputs and the outputs to check consistency and I create a single handle, device and prop.
I created a simple case (temp.py) where I call csr_mv, coo_mv, and dgemm. This standalone test works just fine. But then I play with an application
python Examples/play.py
where I compare the execution of gemm every time. The first call is correct and the second (no matter how I call it is off)Expected behavior
Then I use the interface in an application and I compare the result to the usual CPU gemm https://github.com/paolodalberto/MatrixFlow/blob/main/Matrices/matrices.py#L63
If you execute the last operation using GPU1
As you can see the return matrix from the device is off
Log-files
Add full logfiles to help explain your problem.
Environment
The above hardware Table information can be generated by command:
The above software Table information can be queried with:
Make sure that ROCm is correctly installed and to capture detailed environment information run the following command: