aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
994 stars 118 forks source link

[BUG] BORF failing without numba #2245

Open TonyBagnall opened 3 days ago

TonyBagnall commented 3 days ago

Describe the bug

the BORF transformer fails when numpy turned off, in for example, the overnight codecov tests here

https://github.com/aeon-toolkit/aeon/actions/runs/11491231670

its standard numba stuff, should be quite easy to track down

Steps/Code to reproduce the bug

set

Run numba-disabled codecov tests

on any PR

Expected results

pass test

Actual results

FAILED aeon/testing/tests/test_all_estimators.py::test_all_estimators[check_fit_updates_state(estimator=BORF(),datatype=EqualLengthUnivariate-Classification-numpy3D)] - numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name '_erfinv': Cannot determine Numba type of <class 'function'>

File "aeon/transformations/collection/dictionary_based/_borf.py", line 617:
def _ppf(x, mu=0, std=1):
    return mu + math.sqrt(2) * _erfinv(2 * x - 1) * std
    ^
FAILED aeon/testing/tests/test_all_estimators.py::test_all_estimators[check_non_state_changing_method(estimator=BORF(),datatype=EqualLengthUnivariate-Classification-numpy3D)] - numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name '_erfinv': Cannot determine Numba type of <class 'function'>

File "aeon/transformations/collection/dictionary_based/_borf.py", line 617:
def _ppf(x, mu=0, std=1):
    return mu + math.sqrt(2) * _erfinv(2 * x - 1) * std
    ^
= 2 failed, 8705 passed, 10 skipped, 61 xfailed, 150 xpassed, 636 warnings in 1781.64s (0:29:41) =

Versions

No response

Cyril-Meyer commented 2 days ago

I've investigated this bug and here is some information I found useful :

@nb.vectorize(cache=True)
def _ppf(x, mu=0, std=1):

    w = -math.log((1 - (2 * x - 1)) * (1 + (2 * x - 1)))
    if w < 5:
        w = w - 2.5
        p = 2.81022636e-08
        p = 3.43273939e-07 + p * w
        p = -3.5233877e-06 + p * w
        p = -4.39150654e-06 + p * w
        p = 0.00021858087 + p * w
        p = -0.00125372503 + p * w
        p = -0.00417768164 + p * w
        p = 0.246640727 + p * w
        p = 1.50140941 + p * w
    else:
        w = math.sqrt(w) - 3
        p = -0.000200214257
        p = 0.000100950558 + p * w
        p = 0.00134934322 + p * w
        p = -0.00367342844 + p * w
        p = 0.00573950773 + p * w
        p = -0.0076224613 + p * w
        p = 0.00943887047 + p * w
        p = 1.00167406 + p * w
        p = 2.83297682 + p * w

    p = p * x
    return mu + math.sqrt(2) * p * std
Cyril-Meyer commented 2 days ago

@aeon-actions-bot assign @Cyril-Meyer

MatthewMiddlehurst commented 2 days ago

was also looking at this but came to a different solution 🙂.

@nb.njit(cache=True)
def _get_norm_bins(alphabet_size: int, mu=0, std=1):
    b = np.linspace(0, 1, alphabet_size + 1)[1:-1]
    for i, v in enumerate(b):
        b[i] = _ppf(v, mu, std)
    return b

@nb.njit(cache=True)
def _ppf(x, mu=0, std=1):
    return mu + math.sqrt(2) * _erfinv(2 * x - 1) * std

Issue is that our test disabling numba jit does not work on vectorise for whatever reason. Either can work really, this one will allow the functions to be tracked in test coverage, but not sure if it will be slower.

MatthewMiddlehurst commented 2 days ago

though mine still rasises an error...

https://github.com/aeon-toolkit/aeon/actions/runs/11517193530/job/32061397931?pr=2249#step:7:1636

Will test yours (added the tag to your PR), would just go with that if it works.