JuliaSeismo / SeisNoise.jl

Ambient Noise Cross-Correlation in Julia
MIT License
55 stars 18 forks source link

FFTW: Pre-calculation of plans #102

Closed miili closed 1 year ago

miili commented 1 year ago

Hi Tim,

thanks for your work and this library! When I was working on https://github.com/pyrocko/lightguide, I wanted to run many (>1000) rffts in near real-time. A critical key to performance was the reuse of FFTWs plans (https://www.fftw.org/fftw3_doc/Using-Plans.html). As far as I understand FFTW.jl supports planing (https://juliamath.github.io/FFTW.jl/latest/fft.html#FFTW.plan_r2r).

That increased performance like 10x for me. The matrix size from RawData does not change, maybe you can leverage the same strategy here?

Best wishes,

Marius

tclements commented 1 year ago

Thanks for suggestion! SeisNoise uses the rfft function from AbstractFFTs.jl, which generates a 1D rfft plan for the 2D RawData.x matrix. For a typical size problem, generating the plan is pretty limited compared to doing the rfft. Here is an example using 48 30-minute (a single day of noise) at 100 Hz.

julia> using BenchmarkTools, FFTW, Statistics 

julia> A = rand(Float32, 30 * 60 * 100, 48);

julia> b1 = @benchmark plan_rfft($A, 1)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  32.681 μs …   8.915 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     40.694 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   49.731 μs ± 121.497 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▅█▇▃▃▁
  ▃▅███████▆▅▅▅▄▅▅▅▅▅▅▅▅▅▄▄▄▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▂▂▁▂▁▁▂ ▃
  32.7 μs         Histogram: frequency by time         94.9 μs <

 Memory estimate: 1.91 KiB, allocs estimate: 25.

julia> b2 = @benchmark rfft(A, 1)
BenchmarkTools.Trial: 55 samples with 1 evaluation.
 Range (min … max):  75.384 ms … 100.358 ms  ┊ GC (min … max): 0.00% … 9.76%
 Time  (median):     88.627 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   91.382 ms ±   5.766 ms  ┊ GC (mean ± σ):  3.49% ± 4.64%

                              ▁▁ █▂
  ▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▃▃▃█████▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▃▃▅██▆ ▁
  75.4 ms         Histogram: frequency by time          100 ms <

 Memory estimate: 65.92 MiB, allocs estimate: 27.

julia> ratio(median(b2), median(b1))
BenchmarkTools.TrialRatio:
  time:             2177.899272620042
  gctime:           1.0
  memory:           35411.25409836065
  allocs:           1.08

So generating the plan is ~0.05% of the total computation for this problem. In general, pre-allocating plans would benefit when working on smaller problem sizes.

miili commented 1 year ago

Makes sense. Many small ffts will benefit from this. Not the huge chunks of 1800 * 50 Hz.

Another thing: Have you considered using auto formatting on the code base? My pattern matching brain :heart: consistency :smiley:

Closing this.