QuState / PhastFT

A high-performance, "quantum-inspired" Fast Fourier Transform (FFT) library written in pure and safe Rust.
Apache License 2.0
193 stars 8 forks source link

Comparison to RustFFT is misleading #16

Closed HEnquist closed 2 months ago

HEnquist commented 4 months ago

The readme contains a comparison with RustFFT. This is helpful, but it leaves out a very important aspect. PhastFT only supports power of two lengths, while RustFFT supports arbitrary length. For my own uses of FFT I absolutely need arbitrary length so this difference is critical.

Also, the readme mentions a "single, general-purpose FFT algorithm". Can an algorithm that only supports power of two lengths truly be called general purpose?

logicalguess commented 4 months ago

The readme contains a comparison with RustFFT. This is helpful, but it leaves out a very important aspect. PhastFT only supports power of two lengths, while RustFFT supports arbitrary length. For my own uses of FFT I absolutely need arbitrary length so this difference is critical.

Also, the readme mentions a "single, general-purpose FFT algorithm". Can an algorithm that only supports power of two lengths truly be called general purpose?

Is padding with zeros not working for you? What kind of functionality do you need for non-powers of 2?

HEnquist commented 4 months ago

I can't use padding with zeros because it does not give an equivalent result. It works for some use cases, but it's really not a general solution.

HEnquist commented 4 months ago

Somewhat related, but should perhaps be a separate issue. The documentation for the planner does not mention the power of two requirement, for example here: https://github.com/QuState/PhastFT/blob/01aed517cb6ebf64930f81574035ebd312b38e93/src/planner.rs#L36

HEnquist commented 4 months ago

I just realised that the benchmarks create new planner and FFT instances for every iteration. This is incredibly inefficient and does not reflect how FFTs are typically used. With FFTW and RustFFT, planning and creating an FFT is expensive, and they will both look very slow if you do not reuse the same FFT for more than one iteration. The benchmarks should run all iterations with the same FFT instance. That is IMO the only meaningful way of comparing FFT performance.

smu160 commented 4 months ago

Can an algorithm that only supports power of two lengths truly be called general purpose?

@HEnquist You seem upset. From what I gather, your disappointment stems from a fundamental misunderstanding between the FFT and the DFT. The FFT is nothing more than an algorithm to compute the DFT in special cases. You should consider taking a glance at this writeup by NI. Of note:

The DFT calculation for N samples requires approximately NN complex calculation operations and is a time intensive and a memory intensive calculation. If the length of the sequence is a power of 2, the DFT can be calculated with approximately Nlog(N) operations. The algorithms for this special case are called fast Fourier transform (FFT). The advantages of the FFT include speed and memory efficiency. The DFT can process sequences of any size efficiently but is slower than the FFT and requires more memory . . .


PhastFT only supports power of two lengths, while RustFFT supports arbitrary length.

Yes, RustFFT supports the ability to compute the DFT on arbitrary length sequences. I assume that is why RustFFT implemented Bluestein's algorithm.


For my own uses of FFT I absolutely need arbitrary length so this difference is critical.

In accordance with the original paper by Cooley & Tukey, you can't use the FFT for your use case. What you need is either Bluestein's algorithm, the basic DFT, or something else entirely.

I just realised that the benchmarks create new planner and FFT instances for every iteration. This is incredibly inefficient and does not reflect how FFTs are typically used. With FFTW and RustFFT, planning and creating an FFT is expensive, and they will both look very slow if you do not reuse the same FFT for more than one iteration. The benchmarks should run all iterations with the same FFT instance. That is IMO the only meaningful way of comparing FFT performance.

A third party has already taken this into account in their independently developed benchmarks. I'll let the results speak for themselves. Remarkably, PhastFT's planner creation was included in the measurement, whereas RustFFT's planner was not.

Thank you for your insight.

Best, Saveliy

HEnquist commented 4 months ago

No not upset, sorry if it came across that way.

From what I gather, your disappointment stems from a fundamental misunderstanding between the FFT and the DFT.

There is no misunderstanding. But now I see that the problem with the documentation comes from how FFT is defined. More on that below.

Yes, RustFFT supports the ability to compute the DFT on arbitrary length sequences. I assume that is why RustFFT implemented Bluestein's algorithm.

Yes, Bluesteins, Raders, Mixed Radix and Good-Thomas are all there to support non-power-of-two lengths, and they are all described as FFT algorithms in literature. Most of the logic in the planner is for selecting between these.

In accordance with the original paper by Cooley & Tukey, you can't use the FFT for your use case. What you need is either Bluestein's algorithm, the basic DFT, or something else entirely.

That definition of what FFT is was only relevant in 1965! Sure you can still find texts where only power-of-two algorithms are counted as FFTs (like in that note from NI) but those are a tiny minority. There are many later algorithms that support other lengths, that are referred to as FFTs everywhere except in those few bad examples. Some of these are almost as old as Cooley-Tukey. There is of course no international standard definition of FFT, but in nearly all literature it is used broadly to describe the different algorithms that use various mathematical tricks to reduce the work needed to calculate the DFT.

A third party has already taken this into account in their independently developed benchmarks. I'll let the results speak for themselves. Remarkably, PhastFT's planner creation was included in the measurement, whereas RustFFT's planner was not.

Those results look good and they are much more relevant than the set included in the PhastFT repo. There is still some unnecessary overhead from the planners, but it's probably not affecting the results very much (except for at very short lengths).

smu160 commented 4 months ago

@HEnquist Thank you for getting back to me.

Since Bluestein's Algorithm would be able to handle your use case of arbitrary length sequences (without the need for padding), we'll open up an issue, and we'll add it into the planned features section of the readme.

I found a simple implementation of the Bluestein's Algo (e.g., the Chirp Z-transform) here. Judging by the following snippet (taken from the previous link), we already have fft and ifft, so it shouldn't be too onerous to implement the rest. Implementing the hadamard product using SIMD will be interesting.

from scipy import *

def czt(x, m=None, w=None, a=None):
    # Translated from GNU Octave's czt.m

    n = len(x)
    if m is None: m = n
    if w is None: w = exp(-2j * pi / m)
    if a is None: a = 1

    chirp = w ** (arange(1 - n, max(m, n)) ** 2 / 2.0)
    N2 = int(2 ** ceil(log2(m + n - 1)))  # next power of 2
    xp = append(x * a ** -arange(n) * chirp[n - 1 : n + n - 1], zeros(N2 - n))
    ichirpp = append(1 / chirp[: m + n - 1], zeros(N2 - (m + n - 1)))
    r = ifft(fft(xp) * fft(ichirpp))
    return r[n - 1 : m + n - 1] * chirp[n - 1 : m + n - 1]

@vectorize  # Rounds complex numbers
def cround(z, d=None): return round(z.real, d) + 1j * round(z.imag, d)

arr = [1.0, 2.0, 3.0]

print cround(czt(arr), 4)       # [ 6.0+0.j    -1.5+0.866j -1.5-0.866j]
print cround(fft(arr), 4)       # [ 6.0+0.j    -1.5+0.866j -1.5-0.866j]

Those results look good and they are much more relevant than the set included in the PhastFT repo. There is still some unnecessary overhead from the planners, but it's probably not affecting the results very much (except for at very short lengths).

With the author's permission, the benchmarks I linked will supplant the current benchmarking "framework". Unfortunately, the file output for divan is still a WIP. Once that PR gets merged, we'll no longer be hindered.

Somewhat related, but should perhaps be a separate issue. The documentation for the planner does not mention the power of two requirement . . .

Yes, thank you. We're not at v1.0.0 yet, so there were bound to be deficiencies such as the one you came across. If you don't mind, please open up a separate issue to point out anything else that needs fixing in the readme, docs, etc. I will open up a PR now, and I'll link it to the issue you open.

Thank you for your help.

Best, Saveliy

HEnquist commented 4 months ago

Adding Bluesteins is a very good idea. That will remove the limitation for what lengths can be processed, and is pretty simple to implement. There is a nicely readable one here, already in the right language: https://github.com/calebzulawski/fourier/blob/master/fourier-algorithms/src/bluesteins.rs

Implementing the hadamard product using SIMD will be interesting.

I wouldn't spend too much time on optimising stuff here! In practice, nearly all the time is spent doing the inner FFT and iFFT steps anyway.

smu160 commented 3 months ago

Hi @HEnquist,

We released v0.2.0 with the fixes that you suggested here and in PR #20. The implementation of Bluestein's algorithm will have its own PR opened, shortly.

Please let me know if there is anything else you'd like to see amended prior to this issue being closed. I will add a changelog shortly that will acknowledge contributions, suggestions, etc. Thank you!

Best, Saveliy

HEnquist commented 2 months ago

Thanks, yes I think this can be closed.