Open calebzulawski opened 3 months ago
Hi @calebzulawski,
I think I utilized real to complex in an example, here. I'm going to assume a specialized version for real to complex would offer better performance since we'd avoid doing anything with the zeros.
If that's the case, we can have two separate PR's for closing out this issue -- I'll start a new PR now to implement real-to-complex. I think we can call it rfft_64
and rfft_32
.
A future PR can address the complex-to-real version. Thanks!
Best, Saveliy
I wonder if it might make sense possibly to use a trait? I could see the various supported types ballooning, so you could have something like Fft<In, Out> for Planner
?
@calebzulawski
I'm not well-versed in using traits in Rust. I'm happy to try out any of your suggestions. If you don't mind, please feel free to expand on this idea, and I will try it out
I wonder if it's possible for this library to do what realfft
does. Packing real-valued data into shorter complex-valued sequences seems to be significantly faster than just setting all imaginary parts to 0. Does the power-of-two input length requirement make this unviable?
@astral4
I may be way off base here, so please correct me if I'm wrong. From what I gather, we can create a rfft_64
(and rfft_32
) that simply takes a single input slice of &mut [f64]
or &mut [f32]
. This would be the real components we care about. The advantage I see here is we'd spend, potentially, significantly less CPU cycles. Now, I'm not sure if this would be as good as what realfft
does.
For example, take a look at the implementation of fft_butterfly_n_simd
(which can be found here)
@calebzulawski I'm curious to here your thoughts here as well.
Thank you!
Best, Saveliy
Edit: @astral4 with respect to the viability of what realfft
does and the power of 2 requirement, I am not sure. I need to look into what realfft
does.
Edit 2: A lot of what I wrote above was nonsensical 😅. Please disregard.
@astral4
I wonder if it's possible for this library to do what
realfft
does. Packing real-valued data into shorter complex-valued sequences seems to be significantly faster than just setting all imaginary parts to 0. Does the power-of-two input length requirement make this unviable?
I finally read up on the approached you described, here. I don't see why the power-of-two requirement would make this nonviable. It's definitely the best way forward given that it allows us to reduce memory usage and run-time by 2x (in comparison to using a c2c fft and just setting the imags vec to all 0s). From what I gather, the steps would be as follows:
fft_64
or fft_32
on the complex-valued signalWork on this implementation will continue in #26. Thank you!
Best, Saveliy
In these special cases, the imaginary reads could be replaced with zeros, and the writes elided.