ROCm / MIOpen

AMD's Machine Intelligence Library
https://rocm.docs.amd.com/projects/MIOpen/en/latest/
Other
1.09k stars 230 forks source link

Op2dTensorGeneric kernel upgrade #3305

Closed novakovicdj closed 1 month ago

novakovicdj commented 1 month ago

This is new upgraded Op2dTensorGeneric kernel, this is part of switching from OpenCL to HIP kernels

new_kernel_nxc new_kernel_nx1 new_kernel_1xc new_kernel_1x1

Biggest performance improvements are for bigger tensors.

CAHEK7 commented 1 month ago

Seems some of the new pipelines are not stable right now. The rest tests have passed.