bmorris3 / shampoo

An open source digital holographic microscopy toolkit in Python.
http://shampoo.readthedocs.org/en/latest/
20 stars 11 forks source link

Inefficient pyfftw implementation: order of magnitude speedups possible! #27

Closed bmorris3 closed 7 years ago

bmorris3 commented 7 years ago

@jkentwallace @LaurentRDC – I've begun doing some profiling, and I'm looking to see where the bottlenecks are. In this test I'm reconstructing the USAF test target example file at one propagation distance:

screen shot 2016-11-24 at 10 35 35 am

49% of the runtime is spent within the pyfftw module's Fourier transform, which confirms that the secondary optimizations like doing arithmetic on masked arrays are probably secondary in importance to figuring out how to do FFTs more efficiently. Reading up on pyfftw, it looks like there's a very specific way to uses it that allows for the speedups that I expected.

In simple comparison with scipy's FFT, my current implementation with pyfftw is almost an order of magnitude slower for reconstructing a single hologram – I'm going to need to spend some time understanding this issue. If you'd like to see how much quicker shampoo could be going simply by switching to scipy's FFT temporarily, comment out this block of imports and replace it with only the scipy import line.

The next slowest step is the phase unwrapping, which is being done in C wrapped in Python with this algorithm: reference, source code.

bmorris3 commented 7 years ago

Here are some more updates on profiling.

pyfftw is only faster than scipy given certain caveats, which are related to how the input array is stored in memory. I was using one of the pyfftw APIs that didn't make these caveats obvious, so the advertised speedups over numpy and scipy weren't happening.

In the plot below, I show the runtime for reconstructing various numbers of holograms using different functions within pyfftw, and scipy's FFT for comparison.

profiling

The default in shampoo thus far has been FFTW.interface, which you can see is the slowest of the options I tested, which is why I noted above that we could get speedups by switching immediately to scipy. However, you can go faster by using a different module within FFTW (builders). This module also has an option that allows you to use multithreading to speed up your reconstructions.

Question for @jkentwallace and @LaurentRDC: is there any reason not to use one of these multithreading options by default? If not, I'll work on migrating shampoo's fft methods to use the fast multithreaded pyfftw module.

LaurentRDC commented 7 years ago

Since FFTW is written in C, multithreading should yield a performance boost (unlike Python). I don't see why not :)

In the GUI, I'm reconstructing on a separate core right now, and it would be easy to extend that to many cores. Since each core has more than one thread, it would be a good idea to implement multithreading now.

jkentwallace commented 7 years ago

Gents,

These are great suggestions. I would say full steam ahead.

Also, I've mentioned this in passing before, but I think there is a discreet Fourier transform method that can also improve speed.

I'll futz with it, and try to provide a good example.

Just wanna say it feels like we're getting there guys - thanks to your hard work.

-Kent

On Thu, Nov 24, 2016 at 1:52 PM Laurent P. René de Cotret < notifications@github.com> wrote:

Since FFTW is written in C, multithreading should yield a performance boost (unlike Python). I don't see why not :)

In the GUI, I'm reconstructing on a separate core right now, and it would be easy to extend that to many cores. Since each core has more than one thread, it would be a good idea to implement multithreading now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bmorris3/shampoo/issues/27#issuecomment-262848678, or mute the thread https://github.com/notifications/unsubscribe-auth/AWCJ2e2I9oLCHr2Gu1h5ZRqudz5wmWqoks5rBgc0gaJpZM4K76jw .

bmorris3 commented 7 years ago

I'm thinking I'll make pyfftw a mandatory dependency at this point, since we've quantified its performance now, and its the best option. I was concerned at first that the fftw library installation could be challenging on some platforms, but it seems like no one has had a problem with it yet, so I'm ready to lock in. Any objections?

LaurentRDC commented 7 years ago

Anyone installing Anaconda is equipped with pyFFTW, so there's at least one easy way to get it. Through conda there's no need for a C compiler, so I think making it a dependency is reasonable.

jkentwallace commented 7 years ago

Laurent and Brett, I've spent some time yesterday refreshing my memory on some discrete Fourier Transform calculations, and I've got a few results to show. This method allows you to reconstruct only that part of the fourier plane you're interested in. It also has benefits for reducing the effect of aliasing - which we really haven't talked about. The current bandaid we use is apodization, but this results in reduced resolution. I'll provide an example of this for Monday's meeting - and we can discuss...