Add a "good-thomas double butterfly" algorithm impl

ejmahler commented 6 years ago

When I added the good-thomas implementation a while back, I tried it with very large sizes, and saw that it didn't really stand up to mixed-radix.

Something I didn't try is very small sizes. It turns out that if both child FFTs are small enough to be butterflies, Dgood-Thomas pretty greatly outperforms mixed-radix. So the planner now does Good-Thomas double butterfly instead of Mixed-Radix Double Butterfly whenever possible.

I also tried using the main Good-Thomas Algorithm instead of Mixed Radix when sizes are less than a few thousand, but I got mixed results. Some benchmarks were improved, while others were worse. If we did something like FFTW where we measured performance as a part of the planning process, it would be worth it to test mixed radix vs good-thomas performance for given sizes, but it seems to be too unreliable to do it all the time.

I also tried another stab at computing the reordering indexes on the fly, and I got a version that's faster than the original, but it's still slower than precomputing them, both at small and large sizes.

ejmahler commented 6 years ago

Comparing benchmarks of before and after:

PLANNED -- BEFORE
test complex_composite_20736       ... bench:     470,106 ns/iter (+/- 24,886)
test complex_composite_24028       ... bench:   2,118,454 ns/iter (+/- 96,817)
test complex_composite_24576       ... bench:     361,290 ns/iter (+/- 30,788)
test complex_composite_30270       ... bench:   1,164,039 ns/iter (+/- 151,710)
test complex_composite_32192       ... bench:   2,957,133 ns/iter (+/- 287,858)
test complex_prime_0019            ... bench:         262 ns/iter (+/- 58)
test complex_prime_0151            ... bench:       4,154 ns/iter (+/- 302)
test complex_prime_1009            ... bench:      29,750 ns/iter (+/- 2,597)
test complex_prime_2017            ... bench:      75,501 ns/iter (+/- 6,779)
test complex_primepower_160801     ... bench:  10,726,836 ns/iter (+/- 821,831)
test complex_primepower_44521      ... bench:   2,604,360 ns/iter (+/- 89,333)

PLANNED -- AFTER
test complex_composite_20736       ... bench:     454,925 ns/iter (+/- 131,175)
test complex_composite_24028       ... bench:   1,946,914 ns/iter (+/- 115,707)
test complex_composite_24576       ... bench:     353,485 ns/iter (+/- 16,025)
test complex_composite_30270       ... bench:   1,079,107 ns/iter (+/- 52,330)
test complex_composite_32192       ... bench:   2,719,065 ns/iter (+/- 114,287)
test complex_prime_0019            ... bench:         254 ns/iter (+/- 11)
test complex_prime_0151            ... bench:       3,924 ns/iter (+/- 186)
test complex_prime_1009            ... bench:      27,943 ns/iter (+/- 1,231)
test complex_prime_2017            ... bench:      74,977 ns/iter (+/- 3,572)
test complex_primepower_160801     ... bench:   9,487,384 ns/iter (+/- 238,325)
test complex_primepower_44521      ... bench:   2,468,669 ns/iter (+/- 106,513)

ejmahler commented 6 years ago

And comparing good thomas vs mixed radix at various sizes, showign that good-thomas is much better than mixed radix at small sizes

GOOD-THOMAS:
test good_thomas_0002_3            ... bench:          51 ns/iter (+/- 14)
test good_thomas_0003_4            ... bench:          72 ns/iter (+/- 6)
test good_thomas_0004_5            ... bench:         148 ns/iter (+/- 7)
test good_thomas_0007_32           ... bench:       1,666 ns/iter (+/- 708)
test good_thomas_0032_27           ... bench:      13,751 ns/iter (+/- 743)
test good_thomas_0256_243          ... bench:   1,915,772 ns/iter (+/- 121,952)
test good_thomas_2048_2187         ... bench: 264,304,653 ns/iter (+/- 2,765,125)
test good_thomas_2048_3            ... bench:      71,788 ns/iter (+/- 5,187)
test good_thomas_butterfly_0002_3  ... bench:          32 ns/iter (+/- 7)
test good_thomas_butterfly_0003_4  ... bench:          50 ns/iter (+/- 3)
test good_thomas_butterfly_0004_5  ... bench:         109 ns/iter (+/- 45)
test good_thomas_butterfly_0007_32 ... bench:       1,544 ns/iter (+/- 431)

MIXED RADIX:
test mixed_radix_0002_3            ... bench:          81 ns/iter (+/- 25)
test mixed_radix_0003_4            ... bench:         103 ns/iter (+/- 7)
test mixed_radix_0004_5            ... bench:         180 ns/iter (+/- 51)
test mixed_radix_0007_32           ... bench:       1,861 ns/iter (+/- 115)
test mixed_radix_0032_27           ... bench:      14,030 ns/iter (+/- 768)
test mixed_radix_0256_243          ... bench:   1,735,521 ns/iter (+/- 106,055)
test mixed_radix_2048_2187         ... bench: 193,542,181 ns/iter (+/- 3,073,271)
test mixed_radix_2048_3            ... bench:      75,990 ns/iter (+/- 5,625)
test mixed_radix_butterfly_0002_3  ... bench:          43 ns/iter (+/- 13)
test mixed_radix_butterfly_0003_4  ... bench:          64 ns/iter (+/- 16)
test mixed_radix_butterfly_0004_5  ... bench:         119 ns/iter (+/- 8)
test mixed_radix_butterfly_0007_32 ... bench:       1,684 ns/iter (+/- 126)

awelkie commented 6 years ago

Nice work. Thanks!

As an aside, you may have noticed I haven't been putting much time into this project lately. At this point, I think you have a more complete understanding of the code and you seem to have a good vision for pushing this project forward. Would you be willing to assume ownership of this project? You're already a collaborator, so what this means is I would just defer to you for PR decisions and you would manage the version hosted on crates.io.

ejmahler commented 6 years ago

Works for me!

On Tue, Jul 24, 2018 at 9:51 AM Allen Welkie notifications@github.com wrote:

Nice work. Thanks!

As an aside, you may have noticed I haven't been putting much time into this project lately. At this point, I think you have a more complete understanding of the code and you seem to have a good vision for pushing this project forward. Would you be willing to assume ownership of this project? You're already a collaborator, so what this means is I would just defer to you for PR decisions and you would manage the version hosted on crates.io.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/awelkie/RustFFT/pull/33#issuecomment-407475712, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGmeg5hr7v9q8wxStTdkFHd2Sibusndks5uJ1CJgaJpZM4VaRwL .

awelkie commented 6 years ago

Great! Let me know if you need me to do anything administratively.

ejmahler commented 6 years ago

Could you add me ad an owner on the crate? Username is ejmahler

ejmahler commented 6 years ago

@awelkie Have you had a chance to look at this?

awelkie commented 6 years ago

At the PR or adding you as an owner? I sent the crates.io invitation 5 days ago when you asked, let me know if you didn't get it. I skimmed the PR and it looks fine, but as I said I'll defer to you. You should have permissions to merge pull requests, right?

ejmahler commented 6 years ago

The crates.io account. I didn't notice that you had sent an invite. I just accepted it! thanks

awelkie / RustFFT

Add a "good-thomas double butterfly" algorithm impl #33