ejmahler / rust_dct

Rust library to compute the main four discrete cosine transforms
Apache License 2.0
38 stars 6 forks source link

Advice on how to substitute manual loops with this library #13

Closed torokati44 closed 1 year ago

torokati44 commented 1 year ago

I apologize if this sounds too much like a "do my homework" kind of question, but I can explain. I'm a semi-regular contributor to Ruffle (https://ruffle.rs), a Flash Player emulator.

When implementing the flash.media.SoundMixer.computeSpectrum AS3 method, we were able to reverse-engineer the exact algorithm used to perform the "FFT": https://github.com/ruffle-rs/ruffle/blob/7c46fb207b2ec5e2a3e0d49f7050ef49575a1880/core/src/avm2/globals/flash/media/soundmixer.rs#L151-L179

However, for some content, cosf takes up a considerable amount of runtime, so I thought of speeding up the implementation by switching to a "fast" algorithm.

I have extracted the relevant logic into a little stand-alone program for easier experimentation:

pub fn fft2(mut hist : [f32; 512]) -> [f32; 512]
{
    use rustdct::DctPlanner;

    let mut planner = DctPlanner::new();
    let dct4 = planner.plan_dct1(512);

    dct4.process_dct1(&mut hist);

    hist
}

pub fn fft1(mut hist : [f32; 512]) -> [f32; 512]
{
    // Need to make a copy of the samples used by the FFT, so they aren't
    // modified in place.
    let mut inp = [0.0; 512];
    inp.copy_from_slice(&hist);

    for (freq, h) in hist.iter_mut().enumerate() {
        let mut sum = 0.0;

        for (i, sample) in inp.iter().enumerate() {
            let freq = freq as f32;
            let i = i as f32;
            let coeff = (std::f32::consts::PI * freq * i / 1024.0).cos();

            sum += sample * coeff;
        }

        *h = sum;
    }

    hist
}

pub fn main() {
    use rand::Rng;

    let mut hist = [0.0; 512];
    let mut rng = rand::thread_rng();

    for i in 0..512 {
        hist[i] = rng.gen_range(-1.0..=1.0);
    }

    let r1 = fft1(hist.clone());
    let r2 = fft2(hist.clone());

    let mut sum_diff = 0.0;
    for (a, b) in r1.iter().zip(r2.iter()) {
        sum_diff += (a - b).abs();
    }

    println!("sum_diff: {}", sum_diff);
}

Basically I'd like to have fft2 output the same thing as fft1. As developers of related algorithms, would you please be able to provide some insight as to what kind of transform is being implemented manually here, and how to replace it with something in rustdct, if this is reasonable at all? Thanks in advance!

ejmahler commented 1 year ago

(std::f32::consts::PI * freq * i / 1024.0).cos()

This looks like pi*j*k/2*n, where n is 512. There is no discrete cosine transform or discrete sine transform that maps to this core formula. It also doesn't map to the FFT, because the FFT is 2*pi*j*k/n, IE the 2 is in the numerator instead of the denominator.

If you plugged your data into a FFT of size 4n (using rustfft or some equivalent library), that would get the 2 onto the correct side of the fraction.My intuition says that wouldn't quite do what you wanted though, because the twiddle factors of the full FFT will end up leaking in. But it's worth a shot - create a complex buffer of size 4n, populate the real components of the first n elements with your data, run the fft, and pull the output data out of the first n real elements of the output.

I'm very curious where they got this algorithm from. My intuition says that it's custom, but it would be very easy to hand-derive a matrix factorization - easier, in fact, than the DCT.

torokati44 commented 1 year ago

Oh, wow, you were exactly right! :open_mouth:

I quickly checked this code:

pub fn fft2(mut hist: [f32; 512]) -> [f32; 512] {
    use rustfft::num_complex::Complex32;
    use rustfft::FftPlanner;

    let mut planner = FftPlanner::new();
    let fft = planner.plan_fft_forward(2048);

    let mut h2 = [Complex32::from(0.0); 2048];
    for i in 0..512 {
        h2[i] = Complex32::from(hist[i]);
    }

    fft.process(&mut h2);

    for i in 0..512 {
        hist[i] = h2[i].re;
    }
    hist
}

And it matches down to 4 fractional digits! Hats off to you! Thank you for the explanation!

Truth be told, I already sidestepped this problem by making cosf itself "faster": https://github.com/ruffle-rs/ruffle/pull/9657

But armed with this knowledge, I might make it even better sometime in the future!

ejmahler commented 1 year ago

Great! I'm glad it didn't end up needing a manual factorization.

So yeah, if you end up actually switching to rustFFT to this, I recommend using realfft instead, which wraps RustFFT in an algorithm that cuts the work in half for workloads where all the imaginary components are zero.

torokati44 commented 1 year ago

I recommend using realfft instead

Alright, thank you!

I'm very curious where they got this algorithm from. My intuition says that it's custom

Well, it very well might be. The entire computeSpectrum method does not look that well thought out, consistent, or tested. :no_mouth: And many think, for good reason, that this is true for the entirety of Flash Player's code.