aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
1.01k stars 120 forks source link

[ENH] Tidy up the rocket transformers #1699

Open TonyBagnall opened 4 months ago

TonyBagnall commented 4 months ago

Describe the feature or idea you want to propose

Its time to tidy up the convolutional transformers, will collate all issues here and make tasks for smaller PRs. Replaces #208

To Do

Done

TonyBagnall commented 4 months ago

Time wise results as expected, also comparable to other implementations. This to transform different length series in a univariate collection size of 100, length on x axis, y axis is seconds

image

TonyBagnall commented 4 months ago

after #1781, next issue is to allow variable length. #1746 PR does it by taking numba inside the loop, but it slows it down unacceptably. An alternative, which is already in Rocket, is not to give types in njit arguments. Surprisingly, Rocket does not do this


@njit(fastmath=True, cache=True)
def _apply_kernel_univariate(X, weights, length, bias, dilation, padding):
    n_timepoints = len(X)

    output_length = (n_timepoints + (2 * padding)) - ((length - 1) * dilation)

whereas minirocket does


@njit(
    "float32[:,:](float32[:,:],Tuple((int32[:],int32[:],int32[:],int32[:],float32["
    ":])), int32[:,:])",
    fastmath=True,
    parallel=True,
    cache=True,
)
def _static_transform_uni(X, parameters, indices):

without them, its much easier to use multivariate, but crucially need to assess any performance hit. So some timing experiments

  1. Run minirocket both with and without the annotation and time it, passing 3D numpy
  2. Run rocket both with and without the annotation and time it (possible speed up for main),

based on these results, either remove type annotation and adapt to lists of numpy or add them and use split functions

TonyBagnall commented 4 months ago

Code to test 1 and 2, including direct code without

def timing_experiment_n_cases_main():
    import time
    # Build numba functions
    X = np.random.random(size=(10, 1, 100)).astype(np.float32)
    r = MiniRocket(random_state=0)
    p1 = r.fit_transform(X)
    r2 = MiniRocket(random_state=0)
    p2= r2.fit_transform(X)
    r3 =BadPlaceMiniRocket(random_state=0)
    r4 = BadPlaceMiniRocketMultivariate(random_state=0)
    p3= r3.fit_transform(X)
    p4=r4.fit_transform(X)

    for i in range(500,31000,500):
        X1 = make_example_3d_numpy(n_cases=i, n_channels=1, n_timepoints=500,
                                   return_y=False).astype(np.float32)
        X2 = make_example_3d_numpy(n_cases=i, n_channels=5, n_timepoints=100,
                                   return_y=False).astype(np.float32)

        from aeon.transformations.collection.convolution_based._minirocket import \
            _static_fit, _static_transform_uni, _static_transform_multi
        start = time.time()
        r.fit_transform(X1)
        t1 = time.time() - start
        start = time.time()
        r.fit_transform(X2)
        t2 = time.time() - start
        start = time.time()
        p1=_static_fit(X1)
        X_= X1.squeeze(1)
        _static_transform_uni(X_,p1,MiniRocket._indices)
        t3 = time.time() - start
        start = time.time()
        p1=_static_fit(X2)
        _static_transform_multi(X2,p1,MiniRocket._indices)
        t4 = time.time() - start
        start = time.time()
        r2._fit_transform(X1)
        t5 = time.time() - start
        start = time.time()
        r2._fit_transform(X2)
        t6 = time.time() - start
        start = time.time()
        r3.fit_transform(X1)
        t7 = time.time() - start
        start = time.time()
        r4.fit_transform(X2)
        t8 = time.time() - start

        print(i,",",t1,",",t2,",",t3,",",t4,",",t5,",",t6,",",t7,",",t8)
TonyBagnall commented 4 months ago

so if anything adding type checks makes it a bit slower.

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | With | Without | Diff -- | -- | -- | -- 500 | 0.49 | 0.48 | -0.02 1000 | 0.88 | 0.86 | -0.02 1500 | 1.30 | 1.28 | -0.03 2000 | 1.74 | 1.64 | -0.10 2500 | 2.02 | 2.00 | -0.02 3000 | 2.46 | 2.40 | -0.06 3500 | 2.81 | 2.77 | -0.04 4000 | 3.19 | 3.23 | 0.04 4500 | 3.66 | 3.56 | -0.10 5000 | 4.08 | 4.06 | -0.02 5500 | 4.57 | 4.43 | -0.14 6000 | 4.75 | 4.73 | -0.02 6500 | 5.17 | 5.29 | 0.13 7000 | 5.80 | 5.64 | -0.16 7500 | 6.14 | 5.86 | -0.27 8000 | 6.51 | 6.26 | -0.25 8500 | 6.85 | 6.74 | -0.11 9000 | 7.33 | 7.03 | -0.30 9500 | 7.55 | 7.45 | -0.10 10000 | 7.92 | 7.78 | -0.15