Open kingcrimsontianyu opened 5 years ago
I second this request.
Created an internal ticket SWDEV-180694 to track it. It'd be highly desirable to have this API implemented so machine learning frameworks can properly schedule available GPU resources efficiently.
relevant code in TensorFlow:
Without this function implemented in HIP the grid / block size selection on AMD hardware would always be sub-optimal.
I think this can be closed as of ROCm 2.7?
https://github.com/ROCm-Developer-Tools/HIP/blob/854768787ee9bbd6ed22b3e8fd0f139955a57e6a/src/hip_module.cpp#L1015
The HIP implementation is not comparable to the corresponding CUDA function, which takes a function so that the dynamic shared memory can be a function of the block size.
Cc: @nbeams
I would further clarify that we would like a HIP version for the driver API function cuOccupancyMaxPotentialBlockSize
, which I believe corresponds to cudaOccupancyMaxPotentialBlockSizeVariableSMem
in the runtime API.
I see this was left as a TODO in https://github.com/ROCm-Developer-Tools/HIP/pull/1943/files#diff-9ec4991aeca8528b60eaf6d00b089eecda171d49742e348561c957c5fa2000feR1328-R1342
@gargrahul Can you suggest a workaround?
Hello, I was wondering if this is still being worked on? It's been 2 years since last update here, and unless I have pretty bad user error, it's still not working (somehow breaking calls that occur before I even call it)
@kingcrimsontianyu @0x0015 Please test with latest ROCm 6.1.0 (HIP 6.1)? Thanks!
@0x0015 Have you tried with the latest ROCm 6.1.2? Thanks!
Occupancy calculator API is an invaluable asset in CUDA. Unfortunately
hipOccupancyMaxPotentialBlockSize
is only exposed to Nvidia GPUs for the time being. It would be immensely helpful if it is implemented for AMD GPUs.