Open stevengj opened 9 years ago
I should also mention that your FFT implementation pays a big price for:
x[1:2:end]
etcN==1
(which means that you see a lot of function-call overhead).exp
rather than precomputing a table (or using a recurrence, although that can sacrifice some accuracy). Note that you are calling exp
twice when you only need to call it once, and only need to multiply it by z_odd[k]
once.Here is a simple "upgrade" that is about 4x faster than my_fft(a)
by working in-place on a pre-allocated output array y
:
function myfft!(n, y, y0, ys, x, x0, xs)
if n == 1
y[y0] = x[x0]
else
n2 = n >> 1
myfft!(n2, y, y0, ys, x, x0, 2*xs)
y1 = y0 + n2*ys
myfft!(n2, y, y1, ys, x, x0 + xs, 2*xs)
θ = -2π/n
for k = 0:n2-1
a, b = y[y0+k*ys], y[y1+k*ys] * cis(θ*k)
y[y0+k*ys] = a + b
y[y1+k*ys] = a - b
end
end
return y
end
myfft(x) = myfft!(length(x), Array(promote_type(Complex128, eltype(x)), length(x)), 1,1, x, 1,1)
Very, very helpful, thanks!
Funny coincidence - for the tutorial, I originally had in mind adding a chapter devoted to "in-place methods", and I started writing it based on improving this very same my_fft
routine. But the whole thing was getting too long, and I left it out. Your in-place suggestion is faster than the one I had - unsurprisingly. I also tried an in-place variant using views, rather than using explicit offsets and strides, and that code is more readable but so far I could not manage to accomplish zero memory allocation in this way.
As for the tutorial, since the my_fft
function is potentially one of the first examples of Julia code the reader ever sees, I don't want to intimidate anyone with bells and whistles. I wanted it to look like a naive Matlab implementation. But I certainly don't want to leave the impression that this is the best you can do in Julia, so I'll adapt the example to make this clear and time permitting I'll add that chapter - the message being that you can heavily optimize functions within Julia, without having to resort to a different language. If I get around to doing that, I hope I can reach out for some more comments, that would be helpful.
I understand—it's certainly sensible to do the simplest possible implementation for pedagogical purposes. In any case, you should time the plan_fft(a)
version when comparing to the built-in routines, however, in addition to fft(a)
and my_fft(a)
.
For now I've added timings using plan_fft
, I had no idea the difference could be so substantial. Although the difference is a lot less on my desktop and laptop than it is on Juliabox: on Juliabox I get timings similar to the ones quoted above, at home I don't. I must be using suboptimal libraries.
In any case, with a larger performance gap, the example looses some of its appeal and it would become more compelling again if I add more optimizations in Julia, so I still have that in mind.
You wrote: However, for such a small transform, you're seeing a lot of overhead in
fft(a)
from setting up the FFT plan, setting up the trig tables, etcetera. To really get the full benefit of FFTW, you need to create a precomputed plan:On my machine,
my_fft(a)
is about 6 times lower thanfft(a)
, but about 33 times slower thanp*a
.