Support for spin-weighted spherical harmonic transforms. They are orthonormalized in L^2, with the complex Fourier series as longitudinal basis and with complex coefficients.
Three new functions (x2 for float/double) for Horner's rule and Clenshaw's algorithm for Chebyshev series and for orthogonal polynomial series as well.
Improvements in this PR:
The code is now designed with a cross-compiler in mind. For performance-critical tasks, SIMD is hidden from the user interface and instead is dispatched based on CPU ID. This allows a cross-compiler to include functions with more advanced SIMD than legal for the host computer, but a runtime check ensures that only the best SIMD level is dispatched (closes #12 and #41).
The computational kernels for the spherical/triangular/disk harmonics are refactored to not only use the correct types of registers, but also help the compiler maximize throughput. This relies on a property of Givens rotations that two adjacent rotations commute if they do not act on the same rows. This property allows one to re-order the Givens rotations to increase the ratio of computation to memory loads/stores. The computational kernels and execute drivers are largely generated by a macro, which means the code may already be prepared for AVX-1024 when the instruction sets are available in GCC. Part of this is the introduction of the ft_simd struct to store a bit-field of a variety of SIMD extensions.
The real-to-real FFTW routines now use fftw_execute_dft_r2c and fftw_execute_dft_r2c instead of FFTW_R2HC and FFTW_HC2R-type real-to-real transforms to avoid a global transpose of the data.
The performance benchmark timings were not scaling as O(n3) because one needs to call a function a few times, typically at least twice, before peak performance is realized. These are now updated and the macro FT_TIME helps to bring this support system-wide.
New Examples in this PR:
spinweighted.c is a basic tutorial on how to use spin-weighted spherical harmonic transforms.
Releases no longer trigger the attachment of binaries, as compilation with -march=native may fail on a host computer.
New features in this PR:
Improvements in this PR:
ft_simd
struct to store a bit-field of a variety of SIMD extensions.fftw_execute_dft_r2c
andfftw_execute_dft_r2c
instead ofFFTW_R2HC
andFFTW_HC2R
-type real-to-real transforms to avoid a global transpose of the data.New Examples in this PR:
spinweighted.c
is a basic tutorial on how to use spin-weighted spherical harmonic transforms.Releases no longer trigger the attachment of binaries, as compilation with
-march=native
may fail on a host computer.