This PR refactors the implementation of the asum operator.
It changes completely the kernel writing a specific one for this operator, thus it removes the usage of the AssignReduction and the new kernel performs the operations needed with a single launch.
Each architecture has its backend file that manages the different kernel sizes and its numbers are set empirically using available hardware.
For developing purpose, it also enables again the test for this operator for all compilers.
This PR refactors the implementation of the
asum
operator.It changes completely the kernel writing a specific one for this operator, thus it removes the usage of the
AssignReduction
and the new kernel performs the operations needed with a single launch. Each architecture has its backend file that manages the different kernel sizes and its numbers are set empirically using available hardware.For developing purpose, it also enables again the test for this operator for all compilers.