ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
220 stars 149 forks source link

Update WMMA capability command for ISA 10+ #1986

Closed ellosel closed 3 months ago

ellosel commented 3 months ago

When using a wavefront size of 64, we should pass the following instruction to tryAssembler:

"v_wmma_f32_16x16x16_f16 v[0:3], v[8:15], v[16:23], v[0:3]"

when using using a wavefront size of 32 we should use:

"v_wmma_f32_16x16x16_f16 v[0:7], v[8:15], v[16:23], v[0:7]"

edit: We have opted to only support wave64 WMMA instructions for ISA 10+.

Testing

Compiler version: amdclang++ --version

AMD clang version 19.0.0git (ssh://gerritgit/lightning/ec/llvm-project amd-staging-npi 24284 cb5089623b3268b2628b009afa71fbf00ea5d27a)

Operating system images: RHEL8, Ubuntu 20, Ubuntu 22

babakpst commented 3 months ago

@yoichiyoshida Can you please review this PR again? Thanks.

babakpst commented 3 months ago

Good, can you please remove the change request, so that I can merge it?

nakajee commented 3 months ago

@yoichiyoshida I think you need to approve this PR to remove change request.