Closed bmorris3 closed 7 years ago
Here are some more updates on profiling.
pyfftw
is only faster than scipy
given certain caveats, which are related to how the input array is stored in memory. I was using one of the pyfftw
APIs that didn't make these caveats obvious, so the advertised speedups over numpy and scipy weren't happening.
In the plot below, I show the runtime for reconstructing various numbers of holograms using different functions within pyfftw
, and scipy
's FFT for comparison.
The default in shampoo thus far has been FFTW.interface
, which you can see is the slowest of the options I tested, which is why I noted above that we could get speedups by switching immediately to scipy. However, you can go faster by using a different module within FFTW
(builders). This module also has an option that allows you to use multithreading to speed up your reconstructions.
Question for @jkentwallace and @LaurentRDC: is there any reason not to use one of these multithreading options by default? If not, I'll work on migrating shampoo's fft
methods to use the fast multithreaded pyfftw
module.
Since FFTW is written in C, multithreading should yield a performance boost (unlike Python). I don't see why not :)
In the GUI, I'm reconstructing on a separate core right now, and it would be easy to extend that to many cores. Since each core has more than one thread, it would be a good idea to implement multithreading now.
Gents,
These are great suggestions. I would say full steam ahead.
Also, I've mentioned this in passing before, but I think there is a discreet Fourier transform method that can also improve speed.
I'll futz with it, and try to provide a good example.
Just wanna say it feels like we're getting there guys - thanks to your hard work.
-Kent
On Thu, Nov 24, 2016 at 1:52 PM Laurent P. René de Cotret < notifications@github.com> wrote:
Since FFTW is written in C, multithreading should yield a performance boost (unlike Python). I don't see why not :)
In the GUI, I'm reconstructing on a separate core right now, and it would be easy to extend that to many cores. Since each core has more than one thread, it would be a good idea to implement multithreading now.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bmorris3/shampoo/issues/27#issuecomment-262848678, or mute the thread https://github.com/notifications/unsubscribe-auth/AWCJ2e2I9oLCHr2Gu1h5ZRqudz5wmWqoks5rBgc0gaJpZM4K76jw .
I'm thinking I'll make pyfftw
a mandatory dependency at this point, since we've quantified its performance now, and its the best option. I was concerned at first that the fftw
library installation could be challenging on some platforms, but it seems like no one has had a problem with it yet, so I'm ready to lock in. Any objections?
Anyone installing Anaconda is equipped with pyFFTW, so there's at least one easy way to get it. Through conda there's no need for a C compiler, so I think making it a dependency is reasonable.
Laurent and Brett, I've spent some time yesterday refreshing my memory on some discrete Fourier Transform calculations, and I've got a few results to show. This method allows you to reconstruct only that part of the fourier plane you're interested in. It also has benefits for reducing the effect of aliasing - which we really haven't talked about. The current bandaid we use is apodization, but this results in reduced resolution. I'll provide an example of this for Monday's meeting - and we can discuss...
@jkentwallace @LaurentRDC – I've begun doing some profiling, and I'm looking to see where the bottlenecks are. In this test I'm reconstructing the USAF test target example file at one propagation distance:
49% of the runtime is spent within the
pyfftw
module's Fourier transform, which confirms that the secondary optimizations like doing arithmetic on masked arrays are probably secondary in importance to figuring out how to do FFTs more efficiently. Reading up onpyfftw
, it looks like there's a very specific way to uses it that allows for the speedups that I expected.In simple comparison with
scipy
's FFT, my current implementation withpyfftw
is almost an order of magnitude slower for reconstructing a single hologram – I'm going to need to spend some time understanding this issue. If you'd like to see how much quicker shampoo could be going simply by switching to scipy's FFT temporarily, comment out this block of imports and replace it with only thescipy
import line.The next slowest step is the phase unwrapping, which is being done in C wrapped in Python with this algorithm: reference, source code.