Closed binarman closed 1 year ago
LGTM. Anyone else has other comments? Thanks
@zhanglx13
Is this PR supposed to fix the different i8 mfma instructions on MI300 and non-MI300 gpus?
No, it is fixed by @scxiao in separate PR
About mfma granularity check
I had other idea in my mind.
I was thinking about full instruction selections during AccelerateMatmul
pass and simplification of python (remove checks and casts there) and code generation parts(do not select instruction, simply take it's type from mfma encoding).
Approved. I think @binarman's suggestions make sense to me
- Simplify python semantic checks by removing checks for mfma instructions
- Do mfma instructions selection in
AccelerateAMDMatmul
pass and do all checks there.All these changes can be done is a future PR. @scxiao Do you think we should merge this one (#355) first before merging #368? Or should we merge #368 into this one (#355) and review again?
Yes, this PR should be merged first. Then #368. The PR #357 has pytorch dependency, so cannot be merged for now.
@binarman, could you please take a look at the CI build error? so we can get it merged.
@scxiao Yes, this is a problem with our ci infrastructure, I've restarted testing, It should work now
This PR adds support of fp8 and bf8 instructions