HeloiseS / hoki

Bridging the gap between observation and theory
https://heloises.github.io/hoki/intro.html
BSD 3-Clause "New" or "Revised" License
47 stars 8 forks source link

[DOCS] Using 'numpy.reshape' to create flat arrays for voronoi binning will affect performance #102

Open Knusper opened 3 months ago

Knusper commented 3 months ago

In the Voronoi binning example new views of the 2D image and noise arrays are created by using numpy.reshape. This can (and does) cause overhead in the numpy.ravel operations, as the reshaped arrays are just different "views" of the same data. The vorbin package by M. Cappellari makes use of numpy.ravel and therefore the execution of the algorithm will require significant more ram and will take significant more time. See, e.g., this stackoverflow thread for a description of the problem: https://stackoverflow.com/questions/66296162/numpy-ravel-takes-too-long-after-a-slight-change-to-a-ndarray

https://heloises.github.io/hoki/Voronoi_binning_of_SED_data.html#Calculating-SNR-in-a-line-free-region

After getting bitten by this and banging my head against the wall for a day, I found two solutions. The first one is using a mask. Since we can't pass 'NaNs' to vorbin anyway, we could proceed the following way (im is the image and var_im are the associated variances):

mask = ~np.isnan(im)
Y, X = np.indices(im.shape)
x = X[mask]
y = Y[mask]

# input arays for vorbin
signal_array = im[mask]
noise_array = np.sqrt(var_im[mask])

# summoning vorbin
bin_num, x_node, y_node, x_bar, y_bar, sn, n_pix, scale = \
    voronoi_2d_binning(x, y, signal_array, noise_array, target_snr, plot=plot,
                       pixelsize=1., wvt=False)

One can also alter the shape of an array in-place. The downside for a lot of applications is, that the order in the array is not as well defined as it is the case with reshape . However, in our case this does not matter, as the order will be consistent for each array. This solution, which appears slightly less elegant, reads:

# book-keeping of pixel coordinates
Y, X = np.indices(im.shape)
x = X.copy()
y = Y.copy()
x.shape = (x.shape[0] * x.shape[1])
y.shape = x.shape

# input arays for vorbin
signal_array = im.copy()
noise_array = np.sqrt(var_im)
signal_array.shape = x.shape
noise_array.shape = x.shape

# summoning vorbin
bin_num, x_node, y_node, x_bar, y_bar, sn, n_pix, scale = \
    voronoi_2d_binning(x, y, signal_array, noise_array, target_snr, plot=False,
                       pixelsize=1., wvt=False)