JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.92k stars 5.49k forks source link

64-bit FFTW #1617

Closed ViralBShah closed 11 years ago

ViralBShah commented 12 years ago

FFTW also has a 32-bit interface. As per the documentation, it seems that we have to use the guru interface to get 64-bit.

http://www.fftw.org/doc/64_002dbit-Guru-Interface.html#64_002dbit-Guru-Interface

ViralBShah commented 11 years ago

@stevengj Does FFTW have a 64-bit API? I found the link above in the documentation, which suggests that one has to use the Guru interface for 64-bit.

This is in the context of #1527 and #1532.

StefanKarpinski commented 11 years ago

You don't really need a 64-bit FFTW interface since you can just use the 32-bit interface in a loop.

ViralBShah commented 11 years ago

How? It would need non-trivial work. I'd just rather use the 64-bit interface if such a thing was available.

StefanKarpinski commented 11 years ago

Sorry, I was thinking of DSFMT rather than FFTW. You're right, of course, for FFTW you need to have a 64-bit interface.

ViralBShah commented 11 years ago

Ok. DSFMT is already patched up - that was easy.

stevengj commented 11 years ago

Yes, the 64-bit API in FFTW is the guru64 interface (http://www.fftw.org/fftw3_doc/64_002dbit-Guru-Interface.html). So, you can't use the basic interface if you want 64-bit support, but since you are creating the plans internally the messiness of calling the guru interface for simple things is hidden anyway.

PS. Would be nice to have a simple way to plan with FFTW_PATIENT. Also, there is the issue of data alignment (you need 16-byte aligned arrays if you want to get the benefit of SSE2, AVX, etc., and 32-byte aligned is better).

ViralBShah commented 11 years ago

All arrays in julia are all 16-byte aligned. Is 32-byte alignment required by AVX?

Could you say a bit more about planning with FFTW_PATIENT? Do you mean exposing some of the FFTW plan APIs in julia to provide more flexibility? Nobody had asked for it, until now.

stevengj commented 11 years ago

32-byte alignment is not required by AVX, it is just a bit faster. The good news is that 16-byte alignment is all that is required by FFTW to re-use a plan from one array to another of the same size.

I'm not surprised that no one has asked for it, since users rarely dig into performance issues or do careful benchmark comparisons in my experience. Even forgetting about FFTW_PATIENT, you are sacrificing a lot of performance by creating and destroying the plan for every FFT (which not only re-creates the plan but also recomputes the trigonometric tables etcetera each time).

For example, for a complex double-precision FFT of size 1024, @time fft(x) in Julia reports a running time of about 60us, whereas FFTW's tests/bench -oestimate 1024 reports a running time of 16us. Using FFTW_PATIENT (tests/bench -opatient 1024) results in a running time of 9us. For larger sizes, the overhead of re-creating and destroying the plan each time is less, but the advantage of FFTW_PATIENT is greater. e.g. for size 2^18, Julia takes 29ms while tests/bench -oestimate gives 25ms, but tests/bench -opatient gives a runtime of only 8ms (albeit after 5min to create the plan).

It would be nice to have something like:

  julia> p = plan_fft(x)
  julia> y = p(x);  # computes fft(x) efficiently
  julia> z = p(x.^2); # re-use plan for a different array of same dimensions/strides

That is, plan_fft would just return an optimized fft function (technically, a wrapper around a plan and fftw_execute_dft). (Re-using the plan in this way, at least for arrays of the same strides, is safe since Julia guarantees 16-byte alignment.) plan_fft(x) could default to FFTW_ESTIMATE, but you could also support plan_fft(x, FFTW_PATIENT) etcetera, and even plan_fft(x, FFTW_PATIENT, 10sec) and similar to use the fftw_timelimit mechanism to specify a time limit on the plan creation. And of course plan_fftn, plan_bfft, and similar variants.

Seems like this should be fairly easy to implement (I guess you can use finalizer to make the gc call fftw_destroy_plan when p is garbage-collected).

You'd also want to export the import/export_wisdom functions so that people could save plans.

ViralBShah commented 11 years ago

Fixed in #1856