Closed pavanky closed 8 years ago
@kknox @TimmyLiu This PR fixes all the bugs we are seeing in release mode with arrayfire. There is still another bug that only occurs in debug mode that we are investigating with a lower priority.
@pavanky thanks! Did you have a chance to do some performance benchmark? I am nervous about adding if statements within the kernel. It may add more cycles. But I am not sure since all thread (beta==0 or not ) should execute the same path.
@TimmyLiu We haven't explicitly benchmarked the code, but I don't think the if conditions add overhead because all threads take the same execution path. There is no thread divergence.
@pavanky I agree. Let me double check that it doesn't add more registers either. The performance can be sensitive to register count as well.