Closed lgarrison closed 5 years ago
Awesome, thanks @lgarrison!! I'll take a closer look but the changes to the interface and data model seem minor and acceptable; though I don't what making ImageGaussian
a jitclass might affect. The factor ~5 speedup in get_gaussian_gradients
is very nice indeed. I'm also pleasantly surprised that compute_gaussian
could be sped up so much.
My experience with Scipy interpolation methods is that with care they can be sped up quite a bit, but I haven't looked at BivariateSpline in particular.
I'll add a requirements.txt
for the numba version info.
This PR implements a number of exploratory optimizations for
convert_to_gaussians()
,get_gaussian_gradients()
, andcompute_gaussian()
using Numba.First, there's a lot of NumPy linear algebra operations (inverse, trace, determinant, etc) on 2x2 matrices; these can be optimized pretty effectively by writing out all the terms that need to be computed and compiling that with Numba. Those are implemented as the "
fast_*_2x2
" functions ingaussmodel.py
. The fast matrix inverse makes the most impact.In
convert_to_gaussians()
andget_gaussian_gradients()
, there's some preamble that heavily uses the object model of forcepho, and then some inner loops. The object stuff is difficult to deal with in Numba, so in an effort to at least compile the loops, I've separated those inner parts into their own functions (called_convert_to_gaussians()
and_get_gaussian_gradients()
). it does require passing an annoying number of parameters, though. The performance is good; in both routines, the "loop functions" take less than 10% of the total runtime now. The majority of the remaining runtime in both functions comes from SciPy'sSmoothBivariateSpline
interpolation. We could think about trying to re-implement that in Numba, although I think SciPy is already calling some pre-compiled routine internally.To avoid having to pass back lots of return arrays from the "loop functions", I've changed the
ImageGaussian
container class to ajitclass
that we can manipulate with Numba (it works like a C struct). This doesn't seem to break the non-Numba parts of the code I've tested, but @bd-j will have to say whether this is an allowable change to the data model.Another benefit of using a
jitclass
is thatcompute_gaussian()
can be jitted without too much trouble. I've turned off oversampling for now since that doesn't seem to be a complete feature and complicates the jitting.You'll definitely want to use the latest version of Numba when testing this! Every Numba release brings loads of bugfixes and performance improvements; I'm using version 0.43 from pip.
Some performance numbers, using some of the testing harness from @bd-j's
timing.py
script (but the%timeit
IPython magic timer instead oftime.time()
):Old:
New:
There are definitely more potential gains, e.g. from parallelization. But I thought I'd stop here and see if the interface/data model changes were acceptable at this point! it would also be good to know more about typical values of ngauss, npix, ngal, nstamp, etc, to know which of the un-compiled loops might need compiling.