DTolm / VkFFT

Vulkan/CUDA/HIP/OpenCL/Level Zero/Metal Fast Fourier Transform library
MIT License
1.52k stars 91 forks source link

VkFFT v1.3.2 #141

Closed DTolm closed 11 months ago

DTolm commented 11 months ago

-Added double-double support in VkFFT. Requires cpu initialization in full quad precision, so only supports gcc with quadmath dependency for now. Potentially possible to add full FP128 support or some other FP128 library (like mpir) in the future. -Data has to be stored in double-double before VkFFT kernels calls (no fp128<->double-double conversion on the GPU yet). -Full 1e-32 precision, but same range as FP64. See Library for Double-Double and Quad-Double Arithmetic by Y Hida for more information on double-double. -Double-double requires FMA contraction to be disabled (due to ab-cd contraction rounding mismatch). Doesn't work on Vulkan as I haven't found how to do that yet. -Added DST I-IV support. -Fixed warnings (https://github.com/DTolm/VkFFT/issues/138) -Added proper check for app to be zero before initializeVkFFT call and zeroing on deletion (https://github.com/DTolm/VkFFT/issues/134) -Added an option to provide a staging buffer in the application and VkGPU handle (https://github.com/DTolm/VkFFT/issues/129) -Added guards for build type (https://github.com/DTolm/VkFFT/issues/128) -Changed default innermost stride for real buffers in out-of-place R2C from size[0]+2 to size[0] (https://github.com/DTolm/VkFFT/issues/139) -Allow specifying glslang version (https://github.com/DTolm/VkFFT/pull/135) -Improved instruction count and accuracy for radix-7. -Fixed missing deallocation calls for the inverse Bluestein axes. Fixed the buffer layout size in Vulkan in some cases. -Refactored the code generator and container struct layout for better handling complex numbers (-5k loc). -Added more precision tests and benchmarks.