Open benvanwerkhoven opened 7 years ago
Wow, a factor 3 is impressive. 2^(-21.19) = 4.2e-7 so that should be fine. This refers to absolute maximum errors, but since both sines and cosines cannot become larger than 1 or smaller than -1, this also refers to relative maximum errors.
The calculation of array beam can use lower precision in sincos(). We also use sincos() for calculations with source positions and there the precision requirements are higher, I am not sure float is good enough for this.
I just experimented with using the math function sincosf versus the math intrinsic sincosf, the overal performance of kernel_array_beam improves by a factor of 3 when using the intrinsic sincosf over the function sincosf (tested only with the use_kernel=0 use_shared_mem=1 kernel configuration). The trade-off is of course precision, it depends on the application whether or not this is a problem.
The CUDA programming guide states the following about the precision of the __sincosf intrinsic: For x in [-π,π], the maximum absolute error is 2^(-21.19), and larger otherwise, see here.
I guess we need to test with real data and then judge depending on the results whether or not the error is problematic for the application.