aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
1.02k stars 128 forks source link

[ENH] Give ROCKET unequal length capability. #2351

Closed TonyBagnall closed 3 days ago

TonyBagnall commented 1 week ago

part of #1699

Adapted to take unequal length, was actually simple. I have tested this against the current one for correctness and speed. If anything, its faster.

Basically

  1. removed the type decorators
  2. normalise with Normalizer
  3. change shape[0] to len.
  4. I have changed the parameter from num_kernels to n_kernels to make it consisten
  5. I have changed both classifier and regressor to have unequal capability.

If this is ok I can roll out for the other rockets

So the big question is what to do if series passed to transform are shorter than series seen in fit. In this case, with dilation, the kernels may be longer than the series, causing array out of bounds errors. Considered solutions

  1. Refit kernels in transform if series shorter: abandoned this as it breaks the transform contract of not changing state
  2. Return constant/zero/nan whenever the kernel length exceed series length: simplest solution but not ideal imo. Returning zero doesnt make sense because the values are "percentage of positive values" and "max", so zero is actually conveying information. Could return 0.5 for PPV, but for max? Its getting a little involved. NaNs more transparent and probably a better first solution, but it will just break any pipeline with a classifier that cannot handle NaN.
  3. Do a partial calculation for the part the kernel covers. This requires adjusting this function. I think you need to just reduce the calculations to ignore dilated segments that go beyond the end of the series
**

@njit(fastmath=True, cache=True)
def _apply_kernel_univariate(X, weights, length, bias, dilation, padding):
    """Apply a single kernel to a univariate series."""
    n_timepoints = len(X)

    output_length = (n_timepoints + (2 * padding)) - ((length - 1) * dilation)
    end = (n_timepoints + padding) - ((length - 1) * dilation)
    if output_length <= 0: # THIS 
        return np.nan, np.nan
# HERE: Stop calculation once exceeded dilation
    _ppv = 0
    _max = -np.inf

    for i in range(-padding, end):
        _sum = bias

        index = i

        for j in range(length):
            if index > -1 and index < n_timepoints:
                _sum = _sum + weights[j] * X[index]

            index = index + dilation

        if _sum > _max:
            _max = _sum

        if _sum > 0:
            _ppv += 1

    return np.float32(_ppv / output_length), np.float32(_max)
aeon-actions-bot[bot] commented 1 week ago

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#FEF1BE}{\textsf{enhancement}}$ ]. I would have added the following labels to this PR based on the changes made: [ $\color{#BCAE15}{\textsf{classification}}$, $\color{#7E0206}{\textsf{regression}}$, $\color{#41A8F6}{\textsf{transformations}}$ ], however some package labels are already present.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

TonyBagnall commented 1 week ago

hmmm, more complex than I thought actually. Currently setting the series length to the minimum seen in fit, which then is used to set the dilation range. However, if the smallest in transform is smaller than the smallest in fit, you get overflow and error. Considered refitting the kernel but was convinced thats a bad idea.

So, if the dilated kernel is longer than the series, what to do? Currently thinking return either 0 or NaN.

baraline commented 1 week ago

You could simply throw an error during transform if any series is shorter than (length -1) * dilation ? The kernels are ill defined for shorter series than this value, so an error makes sense for me

TonyBagnall commented 1 week ago

You could simply throw an error during transform if any series is shorter than (length -1) * dilation ? The kernels are ill defined for shorter series than this value, so an error makes sense for me

it would need to be a general policy to do this imo, and baked into testing as it is something that will happen in many algorithms. If we do this then one could in fit take a defensive posture and lower min length. It will all impact though. Not sure really

TonyBagnall commented 6 days ago

so I've been thinking about this and I think @baraline is correct, I think the correct first version should find min size in train and just throw an error in test. To me its a research question how to handle this situation correctly, and until we have any evidence we should do safety first

review-notebook-app[bot] commented 5 days ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB