ROCm / composable_kernel

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
https://rocm.docs.amd.com/projects/composable_kernel/en/latest/
Other
264 stars 104 forks source link

Limit specific instruction tests to proper platforms #1371

Open junliume opened 3 weeks ago

junliume commented 3 weeks ago

https://github.com/ROCm/composable_kernel/blob/a90bfa9857da5cc35a9c5dc1f068b538a1e64c9b/include/ck/utility/amd_smfmac.hpp#L19

in #1309 This instruction should be built for only gfx94 platforms

junliume commented 3 weeks ago

@illsilin our CI should have one stage with GPU_TARGETS of "gfx1100;gfx90a;gfx942" :)

1358 and #1372 are both for GPU_TARGETS="gfx1100;gfx90a;gfx942"

How to reproduce:

CXX=/opt/rocm/bin/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_BUILD_TYPE=Release -DGPU_TARGETS="gfx1100;gfx90a;gfx942" ..

junliume commented 3 weeks ago

More problem fixed in https://github.com/ROCm/composable_kernel/pull/1372/commits/4b81c7a5ae42985f3b07df6bfe3c5c9b8ddddb1a

Hence GPU_TARGETS MATCHES is very problematic because it find matches only, excluding other targets based on match is very fragile.

junliume commented 3 weeks ago

FYI: additional issues are found when building client_example with multiple targets:

cd ${composable_kernel}/client_example/build
CXX=/opt/rocm/llvm/bin/clang++ cmake -DCMAKE_PREFIX_PATH="${composable_kernel}/install/;/opt/rocm/" -DCMAKE_BUILD_TYPE=release -DGPU_TARGETS="gfx1100;gfx90a" -DCMAKE_CXX_FLAGS=" -O3 " ..
make -j$(nproc)

will cause problems

/data/driver/composable_kernel/install/include/ck/tensor_operation/gpu/block/blockwise_gemm_xdlops.hpp:171:57: note: expression evaluates to '256 == 128'
  171 |         static_assert(ThisThreadBlock::GetNumOfThread() == MWaves * NWaves * WaveSize,
      |                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx1100.

because this above should not be compiled for gfx1100 targets.

meanwhile https://github.com/ROCm/composable_kernel/blob/fix_1371/client_example/25_wrapper/CMakeLists.txt#L5-L10 is problematic because there could be multiple targets by default or such as GPU_TARGETS="gfx1100;gfx90a"