XENONnT / straxen

Streaming analysis for XENON
BSD 3-Clause "New" or "Revised" License
21 stars 32 forks source link

Improve InterpolateAndExtrapolate performance for array valued maps #1347

Closed l-althueser closed 4 months ago

l-althueser commented 5 months ago

This is a first try to improve the performance of InterpolateAndExtrapolate. The current main issue is that we do a lot of unnecessary array operations to be more flexible. However, for special use cases we can improve by at least a factor 2 in small tests if we do not store the intermediate results and use Einstein summation convention.

This makes the code more complex but we can save a lot of time for any pattern map operation. I kept the behavior identical to previous implementation for all cases except the array valued maps used for S1 and S2 patterns.

I tested locally with several use cases.

coveralls commented 5 months ago

Coverage Status

coverage: 91.341% (+0.02%) from 91.317% when pulling bc59ce2573291e712cb7987e517246848c77f8f7 on itp_map_speedup into d6d96da0fb581ecff2c2be3a248785d295883d27 on master.

l-althueser commented 5 months ago

Ready for review!

yuema137 commented 4 months ago

Hi @l-althueser , thanks for this PR! I tested this function and for single-dimension values the result looks consistenct between the old and new method: image image

However for multiple dimension values, there is a problem of the dimension of weights. Minimal example to reproduce it:

def generate_multidimensional_dataset(n_points, value_functions):
    """
    Generate a 3D dataset where each point has multiple values (e.g., temperature, pressure, humidity).

    :param n_points: Number of points along each axis.
    :param value_functions: A list of functions to generate the values for each dimension.
    :return: points, multidimensional values
    """
    axes = np.linspace(-5, 5, n_points)
    xx, yy, zz = np.meshgrid(axes, axes, axes, indexing='ij')
    points = np.vstack([xx.ravel(), yy.ravel(), zz.ravel()]).T

    values = np.stack([func(xx, yy, zz) for func in value_functions], axis=-1).reshape(-1, len(value_functions))

    return points, values

# Example value functions for each dimension
def d1_func(x, y, z):
    return x**2 + y**2 + z**2

def d2_func(x, y, z):
    return np.sin(x) + np.cos(y) + np.tanh(z)

def d3_func(x, y, z):
    return np.exp(-(x**2 + y**2 + z**2))

# Generate the dataset
n_points = 10  # Adjust based on your computational capacity
value_functions = [d1_func, d2_func, d3_func]
points, multidimensional_values = generate_multidimensional_dataset(n_points, value_functions)
new_points = points
interpolator = InterpolateAndExtrapolate(points, multidimensional_values)
interpolated_df = interpolator(new_points)

In this example I got an error:

[540](https://vscode-remote+ssh-002dremote-002bmidway2-002dlogin2-002ercc-002euchicago-002eedu.vscode-resource.vscode-cdn.net/home/yuem/package_testers/~/.local/lib/python3.9/site-packages/numpy/lib/function_base.py:540) if wgt.shape[0] != a.shape[axis]:
    [541](https://vscode-remote+ssh-002dremote-002bmidway2-002dlogin2-002ercc-002euchicago-002eedu.vscode-resource.vscode-cdn.net/home/yuem/package_testers/~/.local/lib/python3.9/site-packages/numpy/lib/function_base.py:541)     raise ValueError(
    [542](https://vscode-remote+ssh-002dremote-002bmidway2-002dlogin2-002ercc-002euchicago-002eedu.vscode-resource.vscode-cdn.net/home/yuem/package_testers/~/.local/lib/python3.9/site-packages/numpy/lib/function_base.py:542)         "Length of weights not compatible with specified axis.")

TypeError: 1D weights expected when shapes of a and weights differ.

Also I'm not sure what does array_valued account for. Could you kindly add the docstring for it? Thanks a lot!

dachengx commented 4 months ago

Also I'm not sure what does array_valued account for. Could you kindly add the docstring for it? Thanks a lot!

Hey @yuema137 . array_valued means that the result for a position is not a single value but an array. So in your example, you should set array_valued=True.