faster version of np.outer

sbailey commented 6 years ago

This version provides a faster version of numpy.outer, which was taking a non-trivial amount of extraction processing time. It either uses numba (~3x faster) or if numba isn't installed, it bypasses the numpy type and dimensionality checks to be ~1.5x faster. The overall impact is modest: 6% faster runtimes for a full frame, but that still corresponds to O(5M) NERSC MPP hours saved over the lifetime of DESI.

A knock-on effect was that the psfbias and psfabsbias functions moved from specter.util to specter.extract to avoid a circular dependency.

I tested that this code is faster and produces bitwise identical results (except for the timestamps in the fits headers).

sbailey commented 6 years ago

Forgot to mention: this is pulling in an improvement from the languishing memory branch. It has been there quite awhile but got bogged down in final bookkeeping cleanup for other unrelated features, so I pulled this out separately.

rainwoodman commented 6 years ago

Interesting.

In [8]: a = zeros(1000)

In [9]: %timeit outer(a, a)
1.15 ms ± 1.83 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [10]: %timeit einsum('i,j->ij', a, a)
898 µs ± 3.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [15]: %timeit outer_numba(a, a, empty((len(a), len(a))))
581 µs ± 115 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

sbailey commented 6 years ago

Merging this dangling PR.

@rainwoodman FYI the specter uses of outer are for much smaller arrays (e.g. length 16), so the numpy.outer overhead of checking dtype and dimensionality are even larger than the case you tested here. I hadn't thought about using einsum but that is also a neat trick.

desihub / specter

faster version of np.outer #58