This adds an OpenMP parallelization of the NFSOFT routines. It also includes some improvements in the memory usage of FPT and NFSOFT.
Because the memory allocation inside threaded code in Matlab versions >=2018a tends to be very slow, we have reduced the number of allocations in the FPT code. We also also split the function fpt_precompute internally into two parts, where fpt_precompute_1 contains most of the memory allocations.
Furthermore, we do not set the flag FFT_OUT_OF_PLACE by default anymore in NFSFT and NFSOFT, because the performence is merely the same but the memory footprint is now smaller without that flag.
This adds an OpenMP parallelization of the NFSOFT routines. It also includes some improvements in the memory usage of FPT and NFSOFT.
Because the memory allocation inside threaded code in Matlab versions >=2018a tends to be very slow, we have reduced the number of allocations in the FPT code. We also also split the function
fpt_precompute
internally into two parts, wherefpt_precompute_1
contains most of the memory allocations.Furthermore, we do not set the flag
FFT_OUT_OF_PLACE
by default anymore in NFSFT and NFSOFT, because the performence is merely the same but the memory footprint is now smaller without that flag.