Closed pxl-th closed 2 days ago
In scenarios where there is a lot of pressure on the allocator, like in Nerf.jl, this leads to a consistent NULL
pointer returned from hipMallocAsync
and the return status is always hipSuccess
.
Additionally, AMDGPU.jl package (which is used as a GPU backend) internally keeps the counter of all allocations/deallocations and it shows that there is plenty of memory available. For every allocation it atomically increases counter and atomically decreases counter for every deallocation. It never waits on streams to do that, but since all allocations/deallocations are stream-ordered that shouldn't matter.
And lastly, not sure if it is of any importance, but free
argument of hipMemGetInfo
never decreases as you allocate/deallocate.
julia> using AMDGPU
julia> Base.format_bytes(AMDGPU.Runtime.Mem.free())
"11.984 GiB"
julia> x = ROCArray{Float64}(undef, 1024, 1024, 200);
julia> Base.format_bytes(sizeof(x))
"1.562 GiB"
julia> Base.format_bytes(AMDGPU.Runtime.Mem.free())
"10.422 GiB"
julia> AMDGPU.unsafe_free!(x);
julia> Base.format_bytes(AMDGPU.Runtime.Mem.free())
"10.422 GiB"
julia> AMDGPU.synchronize()
julia> Base.format_bytes(AMDGPU.Runtime.Mem.free())
"10.422 GiB"
julia> Base.format_bytes(AMDGPU.Runtime.Mem.total())
"11.984 GiB"
@pxl-th Apologies for the lack of response. Can you please test with latest ROCm 6.1.0 (HIP 6.1)? If resolved, please close ticket. Thanks!
I think this is resolved. Thanks!
Hi. I'm using ROCm 5.4.3 with Julia and have encountered following situation.
Allocating arrays in a loop without freeing them leads to a NULL pointer at some point (potentially when there is not enough memory). But the return HIP status is
hipSuccess
(when it should behipErrorOutOfMemory
). Allocations are done usinghipMallocAsync
.Is this a known behavior? Do you need to check not only for the return status, but also for whether the pointer is not NULL?