Closed shantirao closed 4 years ago
I would expect that since the new dft2 out parameter mirrors both the name and intended usage of Numpy's out parameter, it should behave the same way:
>>> import numpy as np
>>> a = np.random.uniform((10,10))
>>> b = np.multiply(a, 5, out=a)
>>> b is a
True
The Lentil version doesn't get quite so far (unit test in 28acaf2):
>>> import numpy as np
>>> import lentil as le
>>> f = np.random.uniform((10,10))
>>> F = le.fourier.dft2(f, 1/10, out=f)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-486269a1b018> in <module>
----> 1 le.fourier.dft2(f, 1/10, out=f)
~/Dev/lentil/lentil/fourier.py in dft2(f, alpha, npix, shift, unitary, out)
161
162 np.dot(E1,f,out = Mbyn)
--> 163 F = np.dot(Mbyn,E2,out = out)
164
165 # now calculate the answer, without allocating memory
<__array_function__ internals> in dot(*args, **kwargs)
ValueError: output array is not acceptable (must have the right datatype, number of dimensions, and be a C-Array)
The preallocated output has to be a complex64
I reworked the caching approach a bit to make it slightly safer and 3x more performant. It's now a two stage deal where the coordinate vectors X
, Y
, U
, and V
are cached (this catches most of the cases) and then E1
and E2
are cached (this is useful for when the f
input varies but shift
is the same, as is the case with segmented models).
I also included a check to ensure the result can be cast into the out
array if one is provided.
In which we go to extraordinary lengths to avoid allocating memory
The intermediate working matrices are allocated and cached, as repeated calls tend to have the same size of the input and output matrices. Some of these can even be partially precomputed, such as the X, Y, U, V arrays, except for the phase shifts sx and sy). These are small enough that allocating them dynamically doesn't seem to save much time.
An out parameter is added. If this is None, then the dft2() function allocates a new matrix as needed. But if it's a slice into a 3D NumPy matrix (of type complex128) or a 2D NumPy matrix, that destination will be passed to the matrix operations, eliminating allocations of memory for intermediate operations.
This will have the most effect on speed when calling fourier.dft2() over and over with the same size inputs and outputs.