aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
979 stars 116 forks source link

[ENH] Make minirocket capable of taking unequal length collections #1746

Open TonyBagnall opened 3 months ago

TonyBagnall commented 3 months ago

part of #1699 makes MiniRocket capable of unequal length and deprecates the MiniRocketMultivariateVariable class. This will be rolled out to the other convolution based transformers, also giving associated estimators capability:unequal_length: True tag.

The main issue is you cannot pass a both 3D numpy (equal length) and list of numpy arrays (np-list for unequal) to same numba parameter described by decorator. There are two locations that use numba functions that have to be changed:

  1. _fit_biases: this uses series length internally here
            _X = X[np.random.randint(n_cases)][channels_this_combination]
            A = -_X  # A = alpha * X = -X
            G = _X + _X + _X  # G = gamma * X = 3X
            C_alpha = np.zeros(
                (n_channels_this_combination, n_timepoints), dtype=np.float32
            )

    so my solution is to split it into two functions _fit_biases_numpy and _fit_biases_list. Currently the second is not numba, since I dont think you can easily pass a list of numpy (could very well be wrong). It is not computationally intensive

  2. static _transform this loops through each instance transforming it. My solution is to take this loop out of numba and have a new function _single_case_transform where we pass the case, etc
    _X,
    features,
    n_channels,
    n_timepoints,
    n_dilations,
    n_features_per_dilation,
    dilations,
    n_channels_per_combination,
    channel_indices,
    biases,
    n_kernels,
    indices,

an alternative would be to just remove the decorator typing (not sure if that works) or just have two separate private functions. I'll benchmark times, but atm it looks like it slows things down too much, I'll post graphs below

aeon-actions-bot[bot] commented 3 months ago

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ]. I would have added the following labels to this PR based on the changes made: [ $\color{#41A8F6}{\textsf{transformations}}$ ], however some package labels are already present.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

TonyBagnall commented 2 months ago

timing experiment for reference (main version)


def timing_experiment():
    import time
    # Build numba functions
    X = np.random.random(size=(10, 1, 100))
    r = MiniRocket()
    r.fit_transform(X)
    r2 = MiniRocketMultivariateVariable()
    r2.fit_transform(X)

    for i in range(1000,21000,1000):
        X1 = make_example_3d_numpy(n_cases=i, n_channels=1, n_timepoints=500,
                                   return_y=False)
        X2 = make_example_3d_numpy_list(n_cases=i, n_channels=1, min_n_timepoints=450,
                                        max_n_timepoints=550, return_y=False)
        X3 = make_example_3d_numpy(n_cases=i, n_channels=6, n_timepoints=500,
                                   return_y=False)
        X4 = make_example_3d_numpy_list(n_cases=i, n_channels=6, min_n_timepoints=450,
                                        max_n_timepoints=550, return_y=False)
        start = time.time()
        r.fit_transform(X1)
        t1 = time.time() - start
        start = time.time()
        r2.fit_transform(X2)
        t2 = time.time() - start
        start = time.time()
        r2.fit_transform(X3)
        t3 = time.time() - start
        start = time.time()
        r2.fit_transform(X4)
        t4 = time.time() - start
        print(i," ",t1,",",t2,",",t3,",",t4)