math functions versus intrinsics

benvanwerkhoven commented 7 years ago

I just experimented with using the math function sincosf versus the math intrinsic sincosf, the overal performance of kernel_array_beam improves by a factor of 3 when using the intrinsic sincosf over the function sincosf (tested only with the use_kernel=0 use_shared_mem=1 kernel configuration). The trade-off is of course precision, it depends on the application whether or not this is a problem.

The CUDA programming guide states the following about the precision of the __sincosf intrinsic: For x in [-π,π], the maximum absolute error is 2^(-21.19), and larger otherwise, see here.

I guess we need to test with real data and then judge depending on the results whether or not the error is problematic for the application.

HannoSpreeuw commented 7 years ago

Wow, a factor 3 is impressive. 2^(-21.19) = 4.2e-7 so that should be fine. This refers to absolute maximum errors, but since both sines and cosines cannot become larger than 1 or smaller than -1, this also refers to relative maximum errors.

SarodYatawatta commented 6 years ago

The calculation of array beam can use lower precision in sincos(). We also use sincos() for calculations with source positions and there the precision requirements are higher, I am not sure float is good enough for this.

HannoSpreeuw / Kernel-tuning-for-Sagecal

math functions versus intrinsics #6