ROCm / Tensile

Stretching GPU performance for GEMMs and tensor contractions.
MIT License
218 stars 147 forks source link

Remove WGM related kern args if they are not needed #1844

Closed AlexBrownAMD closed 10 months ago

AlexBrownAMD commented 10 months ago

WGM mapping arguments are not used if WGM is -1, 0, 1. This change updates the signature so that these arguments don't need to be passed when not used.

Built rocblas using this change and all gemm-related tests passed locally on gfx90a.

nakajee commented 10 months ago

The code change looks good. gfx11 precheckin test failed due to build errors. Would you check if it is related to your change?

nakajee commented 10 months ago

It looks like you have not merged your LLVM crash fix yet. Please merge the change.

nakajee commented 10 months ago

gfx908 precheckin test failed. The failed case is mOclSolutionAdapterTest.TimingMulti. Would you please check if this is caused by your change or not?

AlexBrownAMD commented 10 months ago

CI errors seem to be unrelated to this change - couldn't recreate the error locally, and rerunning CI has that test passing (now showing errors on gfx11 seen on other runs).

nakajee commented 10 months ago

CI errors seem to be unrelated to this change - couldn't recreate the error locally, and rerunning CI has that test passing (now showing errors on gfx11 seen on other runs).

gfx11 fail looks like the previous issue fixed by your llvm crash fix. Not sure if this is related to your change. Do you have any idea why this is happening again with llvm crash fix?

AlexBrownAMD commented 10 months ago

Checked with CI team, and suspect gfx11 error is due to OOM on the test node. Going to post a separate PR to test thread reduction just on gfx11 tests.