Open benrichard-amd opened 7 months ago
There are some concerns about introducing a macro for CDNA version.
Using #if __has_builtin may be a better way to determine whether a feature is available (https://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros). It works for all GPUs, even for future generations.
Suggestion Description
As new instructions/features are added with each new arch, it is useful to know the target architecture at compile time to employ separate code paths. For example: FP64 MFMA was added in CDNA2, so CDNA2 and later can use one code path while CDNA1 uses a different code path.
It gets tedious because all the archs need to be enumerated, and code needs to be updated as new archs become available:
It would be nice if we had something like:
This would mirror the way it is done in CUDA:
Operating System
No response
GPU
No response
ROCm Component
No response