ROCArrays matrix multiplication not working

JuliaGPU / AMDGPU.jl

AMD GPU (ROCm) programming in Julia

Other

280 stars 46 forks source link

ROCArrays matrix multiplication not working #103

Closed 0x0f0f0f closed 3 years ago

0x0f0f0f commented 3 years ago

Running on GLIBC void linux. AMD RX570 8gb sapphire. Using my fork for using Yggdrasil HSA artifacts: https://github.com/0x0f0f0f/AMDGPU.jl/tree/artifacts

gen(x) = rand(x,x)
ROCArray(gen(10)) * ROCArray(gen(10))
Memory access fault by GPU node-1 (Agent handle: 0x556f0ed38c80) on address 0xa0000. Reason: Page not present or supervisor privilege.
signal (6): Aborted
in expression starting at REPL[9]:1
Allocations: 40373585 (Pool: 40359774; Big: 13811); GC: 45
Aborted

jpsamaroo commented 3 years ago

And on my system I get:

julia> ROCArray(rand(4,4)) * ROCArray(rand(4,4))
4×4 ROCMatrix{Float64}:
 8.0e-323  8.0e-323  8.0e-323  8.0e-323
 8.0e-323  8.0e-323  8.0e-323  8.0e-323
 8.0e-323  8.0e-323  8.0e-323  8.0e-323
 8.0e-323  8.0e-323  8.0e-323  8.0e-323

Krastanov commented 3 years ago

This seems similar to https://github.com/JuliaGPU/AMDGPU.jl/issues/92 It seems it also depends on the version of rocm that is installed. On the newest ones, I am getting these memory access faults. On older versions of rocm (e.g. 3.5) I simply get wrong answers.

Krastanov commented 3 years ago

@0x0f0f0f, @jpsamaroo, could you let me know which version of rocm you are using when performing tests on the RX 500 series? I am seeing conflicting suggestions on the tensorflow and rocm support forums and I am uncertain what is"best practices". I would like to attempt to debug this more in-depth, but I feel like I should be careful which rocm I use for this debugging.

Also, has this ever worked on an RX 500 card? I am a bit out of the loop and do not have a good idea whether this is a bug that makes it impossible to use the library or if this is just affecting an old GPU that was never really supported.

jpsamaroo commented 3 years ago

I would guess that it's a bug in AMDGPU, not in ROCm. I ran CI on an RX 480 very recently, which is essentially just a lower-clocked RX 500. I doubt RX 400/500 support will disappear entirely from ROCm for another few years.

0x0f0f0f commented 3 years ago

It seems to be working now

Krastanov commented 3 years ago

Seems to be working on my hardware as well (on the current master)

jpsamaroo commented 3 years ago

Well that's confusingly convenient :smile: I'm going to keep this open because right now we're not ensuring correct ordering between raw kernel and HIP-derived launches (because raw kernels use queues, but HIP-derived libraries use their own "streams"). So it's likely that results will be unreliable until that's fixed.