Open torrance opened 1 year ago
Thanks for your report @torrance. rocBLAS supports the equivalent of cublas
@TorreZuk Thank you! HIPIFY complained there was no suitable equivalent and I clearly didn't spend long enough verifying that.
If I can hijack my own issue (!), what about a hipblas/rocblas equivalent to cublasCherkEx()
? My searching of the documentation (as well as HIPIFY) seem to suggest not, and it's a bit of a stickler to the conversion of this codebase.
Sure we can recycle this for request of an equivalent to cublasCherkEx() which is a new feature request. Can ask if @emankov has any insights into cublas
Hello @torrance,
cublasCherkEx() supports CUDA_C_8I
datatype for matrix A. This is a complex number with two 8 bit signed integers. I have some questions about this datatype:
CUDA_C_8I
in cublasCgemmEx() as well as in cublasCherkEx(). We have a rocblas_gemm_ex function, it supports real 8 bit integers but not complex 8 bit integers. CUDA_C_8I
datatype? Real 8 bit integers are used in machine learning, what is the use case for complex 8 bit integers?Thanks Andrew
Hi @amcamd
Can you say what application is using this CUDA_C_8I datatype? Real 8 bit integers are used in machine learning, what is the use case for complex 8 bit integers?
Yes, they are needed. Lots of radio astronomy correlators record observations of the sky as simple 8 bit complex integers, which can later be normalised as part of calibration. The 8 bits integer representation has the advantage of having constant deltas between values, as opposed to floating representation. At the high end, we let the integer representation 'saturate' and later flag these values. They are also necessarily complex, since radio astronomy works in the Fourier domain.
We want to avoid converting these to higher precision values because these values make up the raw data of our observations and are absolutely massive in size.
Hope this helps give some context.
Hi @torrance , Thank you for the context and the use case. I was guessing this is related to radio astronomy and the installations you have in Western Australia.
Is your feature request related to a problem? Please describe.
It's common to have large, low-precision input matrices that you'd like to multiply at full internal precision using
rocblas<t>gemm()
, possibly (but not necessarily) with output at full precision.Describe the solution you'd like
Support the equivalent of
cublas<t>gemmEx()
as described here: https://docs.nvidia.com/cuda/cublas/#cublas-gemmExDescribe alternatives you've considered
An alternative is to copy the input matrices to double precision first. If the output is not required at full precision, a further copy must be made and the precision truncated. This alternative doubles memory pressure on the GPU and causes extra copying of memory.