JuliaMath / FFTW.jl

Julia bindings to the FFTW library for fast Fourier transforms
https://juliamath.github.io/FFTW.jl/stable
MIT License
272 stars 56 forks source link

PATIENT and EXHAUSTIVE plans v slow compared to MEASURE #239

Open AshtonSBradley opened 2 years ago

AshtonSBradley commented 2 years ago

These timings are surprising to me.

It seems that MEASURE is using the fastest plan, but the other flags are not. The others don't know about threads, judging by cpu usage.

Is this expected, or am I not using this correctly?

Also PATIENT and EXHAUSTIVE are not saving on allocations?

using FFTW, BenchmarkTools
N = 512
A = randn(ComplexF64,N,N)
B = copy(A)

## measure
FFTW.forget_wisdom()
FFTW.set_num_threads(8)

P = plan_fft(A,flags=FFTW.MEASURE);
@btime $P*$A;
  1.409 ms (138 allocations: 4.01 MiB)

P! = plan_fft!(A,flags=FFTW.MEASURE);
@btime $P!*$B setup=(B .= A);
  1.401 ms (137 allocations: 9.56 KiB)
## patient
FFTW.forget_wisdom()
FFTW.set_num_threads(8)

P = plan_fft(A,flags=FFTW.PATIENT);
@btime $P*$A;
  458.462 ms (113592 allocations: 11.24 MiB)

P! = plan_fft!(A,flags=FFTW.PATIENT);
@btime $P!*$B setup=(B .= A);
  914.090 ms (226997 allocations: 14.46 MiB)

## exhaustive
FFTW.forget_wisdom()
FFTW.set_num_threads(8)

P = plan_fft(A,flags=FFTW.EXHAUSTIVE);
@btime $P*$A;
500.417 ms (124745 allocations: 11.94 MiB)

P! = plan_fft!(A,flags=FFTW.EXHAUSTIVE);
@btime $P!*$B setup=(B .= A);
  919.095 ms (227010 allocations: 14.46 MiB)

This is on

julia> versioninfo()
Julia Version 1.8.0-beta3
Commit 3e092a2521 (2022-03-29 15:42 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
Environment:
  JULIA_PKG_DEVDIR = /Users/abradley/Dropbox/Julia/Dev
  JULIA_NUM_THREADS = 8
hsgg commented 1 year ago

Yep, I see the same thing, but only on Apple M1. Also to note, FFTW.ESTIMATE results in a ~30% faster transform than FFTW.MEASURE. I see this for 512^3 boxes of in-place transforms.