ROCm / hipBLASLt

hipBLASLt is a library that provides general matrix-matrix operations with a flexible API and extends functionalities beyond a traditional BLAS library
https://rocm.docs.amd.com/projects/hipBLASLt/en/latest/index.html
MIT License
49 stars 80 forks source link

Tune aquavanjaram942 I8I8I TN memory bound GEMM sizes for CU80 #871

Closed aferoz21 closed 2 months ago

aferoz21 commented 3 months ago

1) I8I8I type (ie a_type: i8_r, b_type: i8_r, c_type: i8_r, d_type: i8_r, compute_type: c_i32_r). 2) This PR covers INT8 (TN), memory bound tests alone (33 GEMM sizes). 3) Significant improvement achieved as an average of 450%.

nakajee commented 3 months ago

Please run merge.py (merge same file) to remove lda,b,c,d.

nakajee commented 3 months ago

Did you try LSU2,4? Did LSU1 always win?

aferoz21 commented 3 months ago

Please run merge.py (merge same file) to remove lda,b,c,d.

Yes removed the lda,b,c,d.

aferoz21 commented 3 months ago

Did you try LSU2,4? Did LSU1 always win? yes I tried LSU 2 and LSU4 in one of the config file. LSU 1 wins.

nakajee commented 3 months ago

Did you try LSU2,4? Did LSU1 always win? yes I tried LSU 2 and LSU4 in one of the config file. LSU 1 wins.

OK. Looks like LSU+INT8 is rejected. We need to enable it, but it is ok for now.