Open radudiaconu0 opened 10 months ago
One could, but it would require probably quite a bit of work by someone.
CUDA.jl WMMA support was the work of a full-time master student.
One could, but it would require probably quite a bit of work by someone.
CUDA.jl WMMA support was the work of a full-time master student.
wekk it has to be done at some point. why only nvidia to get all the goodies? :P
For matrix multiplication we are using rocBLAS, adding wmma support won't affect its performance.
And at this moment matrix multiplication is not a bottleneck in DL applications for AMDGPU. Timely memory freeing is.
wekk it has to be done at some point. why only nvidia to get all the goodies? :P
Are you volunteering?
wekk it has to be done at some point. why only nvidia to get all the goodies? :P
Are you volunteering?
i would like to try
Could you implement rocWMMA to use with Navi3 GPU? From what i understood it uses the AI accelerotors present in them for faster matrix multilplication. I guess this could make it faster for DL tasks.
https://github.com/ROCmSoftwarePlatform/rocWMMA