ROCm / llvm-project

This is the AMD-maintained fork of the LLVM git repository. This repository accepts pull requests and issues related to AMD fork-specific topics (amd/*). For all other issues/PRs, please submit upstream at https://github.com/llvm/llvm-project.
Other
124 stars 55 forks source link

[Feature]: Better preprocessor macros to detect RDNA/CDNA family at compile time #59

Open benrichard-amd opened 7 months ago

benrichard-amd commented 7 months ago

Suggestion Description

As new instructions/features are added with each new arch, it is useful to know the target architecture at compile time to employ separate code paths. For example: FP64 MFMA was added in CDNA2, so CDNA2 and later can use one code path while CDNA1 uses a different code path.

It gets tedious because all the archs need to be enumerated, and code needs to be updated as new archs become available:

#if __gfx940__ || __gfx941__ || __gfx942__
// Code path for CDNA3
#elif __gfx90a__
// Code path for CNDA2
#elif __gfx908__
// Code path for CDNA1
#endif

It would be nice if we had something like:

#if CDNA_VERSION >= 3
// Code path for CDNA3 and later
#elif CDNA_VERSION >= 2
// Code path for CDNA2
#else
// Code path for CDNA1
#endif

This would mirror the way it is done in CUDA:

__device__ func()
{
#if __CUDA_ARCH__ >= 800
   // Code path for compute capability 8.x and later
#elif __CUDA_ARCH__ >= 700
   // Code path for compute capability 7.x
#else
  // Code path for compute capability < 7.0
#endif
}

Operating System

No response

GPU

No response

ROCm Component

No response

yxsamliu commented 7 months ago

There are some concerns about introducing a macro for CDNA version.

Using #if __has_builtin may be a better way to determine whether a feature is available (https://clang.llvm.org/docs/LanguageExtensions.html#feature-checking-macros). It works for all GPUs, even for future generations.