ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/index.html
MIT License
49 stars 80 forks source link

Tune aquavanjaram942 SGEMM NN to get peak performance for CU80 #880

Closed aferoz21 closed 2 months ago

nakajee commented 3 months ago
aferoz21 commented 3 months ago
  • I am expecting MT128x128 (or larger) wins, but all winners are MT64x128 or 128x64. Did you try 128x128 Yes I tried 128x128 it did not win.

  • Did you try DepthU=16? Larger MTxDU16 can be better than 128x64x32 I think I need to try DU16.

  • Did you try MI32? Yes, It did not win.

  • GlobalReadVectorWidthA=8 is invalid for SGEMM. Please change to [2,4] (or [1,4]) I tried all 2,4,8. Does 1 help ?