[BUG] BORF failing without numba

TonyBagnall commented 3 days ago

Describe the bug

the BORF transformer fails when numpy turned off, in for example, the overnight codecov tests here

https://github.com/aeon-toolkit/aeon/actions/runs/11491231670

its standard numba stuff, should be quite easy to track down

Steps/Code to reproduce the bug

set

Run numba-disabled codecov tests

on any PR

Expected results

pass test

Actual results

FAILED aeon/testing/tests/test_all_estimators.py::test_all_estimators[check_fit_updates_state(estimator=BORF(),datatype=EqualLengthUnivariate-Classification-numpy3D)] - numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name '_erfinv': Cannot determine Numba type of <class 'function'>

File "aeon/transformations/collection/dictionary_based/_borf.py", line 617:
def _ppf(x, mu=0, std=1):
    return mu + math.sqrt(2) * _erfinv(2 * x - 1) * std
    ^
FAILED aeon/testing/tests/test_all_estimators.py::test_all_estimators[check_non_state_changing_method(estimator=BORF(),datatype=EqualLengthUnivariate-Classification-numpy3D)] - numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Untyped global name '_erfinv': Cannot determine Numba type of <class 'function'>

File "aeon/transformations/collection/dictionary_based/_borf.py", line 617:
def _ppf(x, mu=0, std=1):
    return mu + math.sqrt(2) * _erfinv(2 * x - 1) * std
    ^
= 2 failed, 8705 passed, 10 skipped, 61 xfailed, 150 xpassed, 636 warnings in 1781.64s (0:29:41) =

Versions

No response

Cyril-Meyer commented 2 days ago

I've investigated this bug and here is some information I found useful :

You can reproduce the bug easier
- Set NUMBA_DISABLE_JIT=1
- Run a copy of _borf.py with a call like _get_norm_bins(16)
_ppf is @nb.vectorize (and its the only one in all aeon)
If you change _erfinv decorator from njit into vectorize, the error disappears (this solution looks good to me)
- Performance difference is negligible, here is code for test
- vectorize = 3.88
- njit = 3.76
```
import time
_get_norm_bins(5)
t0 = time.time()
for i in range(100000):
_get_norm_bins(256)
t1 = time.time()
print(t1 - t0)
```
If you put _erfinv code into _ppf, the error also disappears.

@nb.vectorize(cache=True)
def _ppf(x, mu=0, std=1):

    w = -math.log((1 - (2 * x - 1)) * (1 + (2 * x - 1)))
    if w < 5:
        w = w - 2.5
        p = 2.81022636e-08
        p = 3.43273939e-07 + p * w
        p = -3.5233877e-06 + p * w
        p = -4.39150654e-06 + p * w
        p = 0.00021858087 + p * w
        p = -0.00125372503 + p * w
        p = -0.00417768164 + p * w
        p = 0.246640727 + p * w
        p = 1.50140941 + p * w
    else:
        w = math.sqrt(w) - 3
        p = -0.000200214257
        p = 0.000100950558 + p * w
        p = 0.00134934322 + p * w
        p = -0.00367342844 + p * w
        p = 0.00573950773 + p * w
        p = -0.0076224613 + p * w
        p = 0.00943887047 + p * w
        p = 1.00167406 + p * w
        p = 2.83297682 + p * w

    p = p * x
    return mu + math.sqrt(2) * p * std

Cyril-Meyer commented 2 days ago

@aeon-actions-bot assign @Cyril-Meyer

MatthewMiddlehurst commented 2 days ago

was also looking at this but came to a different solution 🙂.

@nb.njit(cache=True)
def _get_norm_bins(alphabet_size: int, mu=0, std=1):
    b = np.linspace(0, 1, alphabet_size + 1)[1:-1]
    for i, v in enumerate(b):
        b[i] = _ppf(v, mu, std)
    return b

@nb.njit(cache=True)
def _ppf(x, mu=0, std=1):
    return mu + math.sqrt(2) * _erfinv(2 * x - 1) * std

Issue is that our test disabling numba jit does not work on vectorise for whatever reason. Either can work really, this one will allow the functions to be tracked in test coverage, but not sure if it will be slower.

MatthewMiddlehurst commented 2 days ago

though mine still rasises an error...

https://github.com/aeon-toolkit/aeon/actions/runs/11517193530/job/32061397931?pr=2249#step:7:1636

Will test yours (added the tag to your PR), would just go with that if it works.

aeon-toolkit / aeon