Closed aferoz21 closed 2 weeks ago
Please run hipblaslt-test on your local node (with the change) and paste the result here if you put noCI label.
I think 128x256 (or 256x128) is better... Maybe due to a bug with generator. Would you try again with the latest one? Or, manually add 128x256,256x128 (or double of M or N side for other than 128x128)
If this PR is necessary for 6.3, please remember file PR to release-staging/rocm-rel-6.3 as well.
I think 128x256 (or 256x128) is better... Maybe due to a bug with generator. Would you try again with the latest one? Or, manually add 128x256,256x128 (or double of M or N side for other than 128x128) Yes. 128x256 or 256x128 picked for most sizes and there is a small improvement.
Please run hipblaslt-test on your local node (with the change) and paste the result here if you put noCI label.
[----------] 1 test from ExtOpTest/ExtOpAMaxWithScaleUnsupportedDatatypeTest [ RUN ] ExtOpTest/ExtOpAMaxWithScaleUnsupportedDatatypeTest.amaxWithScaleFailureUnsupportedDatatype/0 [ OK ] ExtOpTest/ExtOpAMaxWithScaleUnsupportedDatatypeTest.amaxWithScaleFailureUnsupportedDatatype/0 (0 ms) [----------] 1 test from ExtOpTest/ExtOpAMaxWithScaleUnsupportedDatatypeTest (0 ms total)
[----------] Global test environment tear-down [==========] 48206 tests from 13 test suites ran. (2096949 ms total) [ PASSED ] 48206 tests. hipBLASLt version: 1000 command line: ./hipblaslt-test
1) Overall ~4% improvement on the weighted time for the ~10 tuned sizes. 2) New kernels not added but updated the kernels for existing grid points.