awelkie / RustFFT

A mixed-radix FFT library written in pure Rust
Apache License 2.0
189 stars 18 forks source link

Reduce rader's algorithm memory usage #34

Closed ejmahler closed 6 years ago

ejmahler commented 6 years ago

Previously, we were using 2 arrays in order to store the indexes to reorder the input and output.

But the output indexes can be trivially computed from the input indexes, by just reversing them. So this deletes one of the arrays, reducing memory usage by N usizes.

Benchmarks show no change in performance:

BEFORE:
test complex_prime_00005           ... bench:          17 ns/iter (+/- 0)
test complex_prime_00017           ... bench:         125 ns/iter (+/- 5)
test complex_prime_00151           ... bench:       3,979 ns/iter (+/- 194)
test complex_prime_00257           ... bench:       3,546 ns/iter (+/- 154)
test complex_prime_01009           ... bench:      28,497 ns/iter (+/- 1,071)
test complex_prime_02017           ... bench:      75,792 ns/iter (+/- 3,353)
test complex_prime_65537           ... bench:   2,113,894 ns/iter (+/- 120,269)
test complex_prime_746497          ... bench:  66,716,805 ns/iter (+/- 1,669,850)
test complex_primepower_160801     ... bench:   9,650,167 ns/iter (+/- 231,222)
test complex_primepower_44521      ... bench:   2,514,551 ns/iter (+/- 101,281)

AFTER:
test complex_prime_00005           ... bench:          17 ns/iter (+/- 3)
test complex_prime_00017           ... bench:         124 ns/iter (+/- 5)
test complex_prime_00151           ... bench:       3,990 ns/iter (+/- 192)
test complex_prime_00257           ... bench:       3,600 ns/iter (+/- 177)
test complex_prime_01009           ... bench:      28,408 ns/iter (+/- 1,668)
test complex_prime_02017           ... bench:      75,849 ns/iter (+/- 3,962)
test complex_prime_65537           ... bench:   1,969,446 ns/iter (+/- 109,075)
test complex_prime_746497          ... bench:  66,869,075 ns/iter (+/- 1,332,993)
test complex_primepower_160801     ... bench:   9,722,685 ns/iter (+/- 522,745)
test complex_primepower_44521      ... bench:   2,516,777 ns/iter (+/- 122,022)