This PR implement a native FFT with Cooley-Tukey method,current it only support radix-2
The performance is far worse than pytorch aten library (which use CuFFT library actually) for some reasons
triton-lang does not support complex dtype so we need to separate the imag and real part
triton-lang does not support __syncthreads so we must launch much more kernels
PR Category
Operator
Type of Change
New Feature
Description
This PR implement a native FFT with Cooley-Tukey method,current it only support radix-2 The performance is far worse than pytorch
aten
library (which useCuFFT
library actually) for some reasons__syncthreads
so we must launch much more kernelsIssue
Progress
Performance