ROCm / HIP

HIP: C++ Heterogeneous-Compute Interface for Portability
https://rocmdocs.amd.com/projects/HIP/
MIT License
3.56k stars 520 forks source link

[Julia] hipMallocAsync resulted in NULL pointer #3223

Closed pxl-th closed 2 days ago

pxl-th commented 1 year ago

Hi. I'm using ROCm 5.4.3 with Julia and have encountered following situation.

Allocating arrays in a loop without freeing them leads to a NULL pointer at some point (potentially when there is not enough memory). But the return HIP status is hipSuccess (when it should be hipErrorOutOfMemory). Allocations are done using hipMallocAsync.

Is this a known behavior? Do you need to check not only for the return status, but also for whether the pointer is not NULL?

julia> using AMDGPU

julia> for i in 1:10_000
           ROCArray{Float64}(undef, 1024, 1024)
       end
...
status = AMDGPU.HIP.hipSuccess
status = AMDGPU.HIP.hipSuccess
ERROR: AssertionError: hipMallocAsync resulted in C_NULL for 8.000 MiB
Stacktrace:
 [1] #HIPBuffer#25
   @ ~/.julia/dev/AMDGPU/src/runtime/memory/hip.jl:32 [inlined]
 [2] HIPBuffer
   @ ~/.julia/dev/AMDGPU/src/runtime/memory/hip.jl:8 [inlined]
 [3] ROCMatrix{Float64}(#unused#::UndefInitializer, dims::Tuple{Int64, Int64})
   @ AMDGPU ~/.julia/dev/AMDGPU/src/array.jl:132
 [4] ROCArray
   @ ~/.julia/dev/AMDGPU/src/array.jl:142 [inlined]
 [5] ROCArray
   @ ~/.julia/dev/AMDGPU/src/array.jl:143 [inlined]
 [6] top-level scope
   @ ./REPL[2]:2
pxl-th commented 1 year ago

In scenarios where there is a lot of pressure on the allocator, like in Nerf.jl, this leads to a consistent NULL pointer returned from hipMallocAsync and the return status is always hipSuccess.

Additionally, AMDGPU.jl package (which is used as a GPU backend) internally keeps the counter of all allocations/deallocations and it shows that there is plenty of memory available. For every allocation it atomically increases counter and atomically decreases counter for every deallocation. It never waits on streams to do that, but since all allocations/deallocations are stream-ordered that shouldn't matter.

And lastly, not sure if it is of any importance, but free argument of hipMemGetInfo never decreases as you allocate/deallocate.

julia> using AMDGPU

julia> Base.format_bytes(AMDGPU.Runtime.Mem.free())
"11.984 GiB"

julia> x = ROCArray{Float64}(undef, 1024, 1024, 200);

julia> Base.format_bytes(sizeof(x))
"1.562 GiB"

julia> Base.format_bytes(AMDGPU.Runtime.Mem.free())
"10.422 GiB"

julia> AMDGPU.unsafe_free!(x);

julia> Base.format_bytes(AMDGPU.Runtime.Mem.free())
"10.422 GiB"

julia> AMDGPU.synchronize()

julia> Base.format_bytes(AMDGPU.Runtime.Mem.free())
"10.422 GiB"

julia> Base.format_bytes(AMDGPU.Runtime.Mem.total())
"11.984 GiB"
ppanchad-amd commented 2 months ago

@pxl-th Apologies for the lack of response. Can you please test with latest ROCm 6.1.0 (HIP 6.1)? If resolved, please close ticket. Thanks!

pxl-th commented 2 days ago

I think this is resolved. Thanks!