codeplaysoftware / portBLAS

An implementation of BLAS using the SYCL open standard.
Apache License 2.0
258 stars 50 forks source link

fix for DEFAULT TUNING_TARGET on AMD and NVIDIA GPUs #517

Closed s-Nick closed 5 months ago

s-Nick commented 5 months ago

This PR fixes most of the tests that fails on AMD and NVIDIA GPUs using DEFAULT configuration. It fixes all of them for AMD and let only trsm operator to be fixed for NVIDIA.

In particular it fixes:

iamax/iamin: The sycl:shift_group_left api requires all group(sub_group) takes part to the operation, removing the if-condition solves the problem.

txsv operators: broadcast operations inside the kernel require a specific size of group and subgroup, so calling the kernel implementation from default is not enough due to hardware differences. This solution uses runtime checks to select the correct template parameters. This leads to compile more kernels than before but from my tests it doesn't affect significantly compilation time.