Closed matinraayai closed 1 year ago
@matinraayai at the time this was opened, the malloc implementation was unfortunately not ready for use and the documentation should have said so. The implementation available in ROCm 5.4 and later should work as you expect.
For all following this issues there is an malloc test/example in the code base: https://github.com/ROCm-Developer-Tools/HIP/blob/78aaa848a4470eb78c5e25f615856d51462b6ed6/tests/src/deviceLib/hipDeviceMalloc.cpp
@b-sumner Is it required to set an allocation pool size for the on-device malloc operation equal to cudaDeviceSetLimit
on NVIDIA cards?
I know this was in the early version of ROCm a compile-time definition. I found in the documentation hipDeviceSetMemPool
, ... but it is not clear what kind of mem-pool this function influences and how to use it correctly. I do not find an example of how to use hipDeviceSetMemPool
and other related parts.
It is not. This implementation allows the "heap" to grow as large as needed. Note that this differs from cuda and applications using this feature may have trouble when running elsewhere.
@b-sumner thanks for the update.
Hello, After reading the HIP programming documentation, I was under the impression that calling
malloc
inside a__global__
function is supported; However, the following code throws the following exception when compiled with hipcc. Compiling with nvcc works as intended.AMD output:
CUDA output:
Could you please clarify the development status of this feature? We are teaching a course on HIP so it would help us get the correct information across.
Thanks, Matin