The motivation for this update was to speed up the SED.calculateFlux method in the typical case that both the SED and the Bandpass are LookupTables. It had been doing a bunch of calculations to get the SED and Bandpass to use the same wavelengths and making new arrays for each, and then calling np.trapz.
I realized that we could avoid the temporary numpy arrays as well as some other O(N) calculations along the way by treating this as kind of a merge sort coupled with trapz, which we could do at the same time in C++. The idea was to keep track of where we were in the two lookup tables and accumulate the integration steps with whichever table had the next abscissa value.
This works well and is indeed faster. However, along the way I decided to make the additional improvement to do the quadratic term for the product of two linear segments, rather than treat the product as linear. This isn't much more work, and in C++ the overhead of doing this is pretty small, so it seemed worth doing to make the integral a bit more accurate.
However, this ended up opening a big can of worms for the unit tests. Lots of tests in test_sed and test_chromatic were pretty hard-coded to the trapz approximation of the integrals. So making the integrals more accurate in SED.calculateFlux broke a bunch of tests. To fix them, I had to make the unit test integrals more accurate by using finer wavelength arrays and also use the integrate_product function in some other places as well so things like DCR and the seeing moments calculations had the improved accuracy for both numerator and denominator.
I also added a quadRule integration method implementing this for the chromatic drawImage integration so that those images' fluxes would match up to what the SED thinks the net flux should be.
All in all, I think it's probably an improvement. The real-life accuracy isn't actually that much better with quad than trapz. They are both O(dx^3) algorithms, but the prefactor on quad is 2x smaller I think. So it's only more accurately matching the approximation that the SED and bandpass are both linear between the wavelengths. It's not really all that much more accurate in real terms.
Anyway, after all this, the calculateFlux time in my imsim run that I'm testing went from 13% down to 5%, so it made a significant difference in terms of efficiency, even if the accuracy gains are probably minimal. (That wasn't really the point after all.)
Oh, and there was one surprising side effect of this effort. I started out with adding a LookupTable.integrate method to just do the straight integral, not the product. For linear LookupTables, this is equivalent to np.trapz, but it also works for our other interpolation options. The surprise was that my implementation of this is about 30-40% faster than np.trapz for that simple linear case. I didn't even try to do anything fancy. It's just the straightforward implementation of the trapezoidal rule. I'm not sure what overhead numpy has, but probably related to the extra options my code doesn't have (like axis). Anyway, I made a galsim.trapz as a drop in replacement for np.trapz if you don't need those extra options to possibly gain a little speed. Probably doesn't matter too much for most of our use cases, but I switched over the places where we use trapz, FWIW.
The motivation for this update was to speed up the
SED.calculateFlux
method in the typical case that both the SED and the Bandpass are LookupTables. It had been doing a bunch of calculations to get the SED and Bandpass to use the same wavelengths and making new arrays for each, and then callingnp.trapz
.I realized that we could avoid the temporary numpy arrays as well as some other O(N) calculations along the way by treating this as kind of a merge sort coupled with trapz, which we could do at the same time in C++. The idea was to keep track of where we were in the two lookup tables and accumulate the integration steps with whichever table had the next abscissa value.
This works well and is indeed faster. However, along the way I decided to make the additional improvement to do the quadratic term for the product of two linear segments, rather than treat the product as linear. This isn't much more work, and in C++ the overhead of doing this is pretty small, so it seemed worth doing to make the integral a bit more accurate.
However, this ended up opening a big can of worms for the unit tests. Lots of tests in test_sed and test_chromatic were pretty hard-coded to the trapz approximation of the integrals. So making the integrals more accurate in
SED.calculateFlux
broke a bunch of tests. To fix them, I had to make the unit test integrals more accurate by using finer wavelength arrays and also use theintegrate_product
function in some other places as well so things like DCR and the seeing moments calculations had the improved accuracy for both numerator and denominator.I also added a
quadRule
integration method implementing this for the chromaticdrawImage
integration so that those images' fluxes would match up to what the SED thinks the net flux should be.All in all, I think it's probably an improvement. The real-life accuracy isn't actually that much better with quad than trapz. They are both O(dx^3) algorithms, but the prefactor on quad is 2x smaller I think. So it's only more accurately matching the approximation that the SED and bandpass are both linear between the wavelengths. It's not really all that much more accurate in real terms.
Anyway, after all this, the
calculateFlux
time in my imsim run that I'm testing went from 13% down to 5%, so it made a significant difference in terms of efficiency, even if the accuracy gains are probably minimal. (That wasn't really the point after all.)Oh, and there was one surprising side effect of this effort. I started out with adding a
LookupTable.integrate
method to just do the straight integral, not the product. Forlinear
LookupTables, this is equivalent tonp.trapz
, but it also works for our other interpolation options. The surprise was that my implementation of this is about 30-40% faster thannp.trapz
for that simple linear case. I didn't even try to do anything fancy. It's just the straightforward implementation of the trapezoidal rule. I'm not sure what overhead numpy has, but probably related to the extra options my code doesn't have (likeaxis
). Anyway, I made agalsim.trapz
as a drop in replacement fornp.trapz
if you don't need those extra options to possibly gain a little speed. Probably doesn't matter too much for most of our use cases, but I switched over the places where we usetrapz
, FWIW.