Closed sbailey closed 6 years ago
Forgot to mention: this is pulling in an improvement from the languishing memory branch. It has been there quite awhile but got bogged down in final bookkeeping cleanup for other unrelated features, so I pulled this out separately.
Interesting.
In [8]: a = zeros(1000)
In [9]: %timeit outer(a, a)
1.15 ms ± 1.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [10]: %timeit einsum('i,j->ij', a, a)
898 µs ± 3.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [15]: %timeit outer_numba(a, a, empty((len(a), len(a))))
581 µs ± 115 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Merging this dangling PR.
@rainwoodman FYI the specter uses of outer
are for much smaller arrays (e.g. length 16), so the numpy.outer
overhead of checking dtype and dimensionality are even larger than the case you tested here. I hadn't thought about using einsum
but that is also a neat trick.
This version provides a faster version of
numpy.outer
, which was taking a non-trivial amount of extraction processing time. It either uses numba (~3x faster) or if numba isn't installed, it bypasses the numpy type and dimensionality checks to be ~1.5x faster. The overall impact is modest: 6% faster runtimes for a full frame, but that still corresponds to O(5M) NERSC MPP hours saved over the lifetime of DESI.A knock-on effect was that the
psfbias
andpsfabsbias
functions moved fromspecter.util
tospecter.extract
to avoid a circular dependency.I tested that this code is faster and produces bitwise identical results (except for the timestamps in the fits headers).