GalSim-developers / GalSim

The modular galaxy image simulation toolkit. Documentation:
http://galsim-developers.github.io/GalSim/
Other
227 stars 107 forks source link

#935 Speed up draw functions for InterpolatedImage #1126

Closed rmjarvis closed 3 years ago

rmjarvis commented 3 years ago

This has been an open issue for a while, since I realized that the way the fillXImage and fillKImage functions in SBInterpolatedImage were not particularly efficient. They both used caching within the XTable and KTable interp methods, which was effective, but it encouraged looping over the images against the natural stride direction, which meant that they were missing opportunities for vectorization.

The main thing that this PR does is to implement all 4 fill functions directly and be smarter about the caching. The caching that used to be there is now done for whole rows at a time, rather than just a few values, which lets more loops have unit stride, and thus vectorize nicely. Moreover, for the rectilinear versions (i.e. not sheared or rotated), there is an additional caching option of saving the whole row results for a given y-interpolant value, which wasn't really possible with the old code structure.

The rectilinear versions got a lot faster as a result of this refactoring. Unfortunately, most use cases do have shear/rotation (from the WCS typically), so the typical speedup is less. But even the non-rectilinear functions are 10s of percent faster for common use cases.

The use case in particular that I have in mind is a Piff refactoring that I'm working on. I have an idea about how to implement the composite PSF, which I'm still working through. But I found that a version of my new approach was dominated by the calls to _drawReal. This PR speeds up that step by a factor of 2.

Another timing data point is that the total run of nosetests on my laptop reduced by about 10% from this PR. This is pretty significant, presumably because we use InterpolatedImage in a lot of contexts. The test suite is of course not representative of real-world usage, but these optimizations help in a lot of different contexts.

Finally, since this was almost the last place we still used XTable and KTable, I went ahead and replaced all the other uses with the direct Image FFTs, which were implemented a while back. The main other place I had to edit was in the HSM code, where I think the new version in terms of image FFTs is significantly more legible. I removed the FFT.h and FFT.cpp files, since they are no longer used anywhere.

I'd like to include this update in version 2.3 if someone has time to review this. Thanks!