Split k mm fix - Githubissues

xiaohuguo2023 commented 9 months ago

@scxiao @wenchenvincent @jayfurmanek @vgokhale

While @scxiao mentioned we need adapt our autune system to enable SPLIT_K working properly, Jason suggest I should make a pull request for all to review where we should start with.

The current changes have been summarized below:

./script/amd/gemm/matmul.py

Fixing SPLIT_K matmul

- add pytest coverage when using SPLIT_K.

- add maxloc function to print out the max value discrepancy

-  fix output matrix initialization issue

- change the leaky_relu function to be same with torch.nn.functional.leaky_relu, and add pytest coverage when using activation leaky_relu

./python/tutorials/03-matrix-multiplication.py

change leaky_relu definition to be same with torch.nn.functional.leaky_relu and add test coverage if doing fused matmul

zhanglx13 commented 5 months ago

@xiaohuguo2023 What is the status of this PR. Do we have any conclusions?

xiaohuguo2023 commented 5 months ago

@zhanglx13 can we wait for fp16 atomic-add ISA instruction fix, we may need redo benchmarking for splitK

cc. @wenchenvincent

zhanglx13 commented 5 months ago

Do we have a timeline for the isa fix?

xiaohuguo2023 commented 5 months ago

Do we have a timeline for the isa fix?

we have a ticket for this, working on it. hopefully by next week.

https://github.com/ROCm/frameworks-internal/issues/7465

ROCm / triton

Split k mm fix #416