codeplaysoftware / portBLAS

An implementation of BLAS using the SYCL open standard.
Apache License 2.0
250 stars 48 forks source link

Update configuration for gemm on AMD GPUs #494

Closed s-Nick closed 7 months ago

s-Nick commented 7 months ago

Following preliminary investigation and tuning with the auto-tuner, these are the new configurations for gemm that provide the best performance. The selection of the configuration is now based on the arithmetic intensity and not only on _M and _N dimension.