hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
In the documentation of hipblasLtMatmulAlgoGetHeuristic, it is not obvious about the time it would take to complete the API.
It has been empirically identified that if returnAlgoCount exceeds the size of pre-determined solutions, the internal function getAllSolutions will be invoked, which causes undeterministic run time in the production applications:
Problem Description
In the documentation of
hipblasLtMatmulAlgoGetHeuristic
, it is not obvious about the time it would take to complete the API.It has been empirically identified that if
returnAlgoCount
exceeds the size of pre-determined solutions, the internal functiongetAllSolutions
will be invoked, which causes undeterministic run time in the production applications:https://github.com/ROCm/hipBLASLt/blob/develop/library/src/amd_detail/rocblaslt/src/rocblaslt_auxiliary.cpp#L1364-L1369
Please consider revise the documentation or revise the implementation so the run time of the API is more deterministic.
Operating System
UB 22.04
CPU
x86
GPU
AMD Instinct MI250X
Other
No response
ROCm Version
ROCm 5.7.1
ROCm Component
No response
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response