[BUG] PyODAdapter only returns decision_scores_ of train-set

roadrunner-gs commented 1 month ago

Describe the bug

The PyODAdapter currently does not support predict() on test-data, only decisionscores on data classifier was fitted on is available. https://pyod.readthedocs.io/en/latest/pyod.models.html#module-pyod.models.lof (...) decisionscoresnumpy array of shape (n_samples,) The outlier scores of the training data. The higher, the more abnormal. Outliers tend to have higher scores. This value is available once the detector is fitted. (...) I would expect to fit on train-data and predict on test-data. Furthermore fit_predict of PyOD is deprecated and therefore should not be used by an adapter as to not elicit unexpected behaviour for persons versed with the underlying PyOD.

Output:

Steps/Code to reproduce the bug

import numpy as np
import warnings
from pyod.models.lof import LOF  
from aeon.anomaly_detection import PyODAdapter
from aeon.utils.windowing import reverse_windowing

warnings.simplefilter('ignore')

def sliding_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

X = np.asarray([0, 0, 0, 0, 0, 0, 1, 0, 0, 0])
Y = np.asarray([0, 0, 0, 0, 0, 1, 0, 0, 0, 0])
X_win = sliding_window(X, 2)
Y_win = sliding_window(Y, 2)
print("train:", X)
print("test :", Y)

detector = PyODAdapter(LOF(), window_size=2)
print("LOF via PyODAdapter")
print(detector.fit_predict(X, axis=0))

detector.fit(X)
print("predicting on test via PyODAdapter")
print(detector.predict(Y))

print("LOF via PyOD")
clf = LOF()
clf.fit(X_win)
print(reverse_windowing(clf.decision_scores_, 2, np.nanmean, 1, 2))
print("decision_function on test via PyOD")
print(reverse_windowing(clf.decision_function(Y_win), 2, np.nanmean, 1, 2))

Expected results

Ability to use test-sets and getting scores for the test-set returned, see below for comparison.

Actual results

$ python pyod_test.py
<frozen importlib._bootstrap>:228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
/home/roadrunner/miniconda3/envs/py3k/lib/python3.9/site-packages/aeon/base/__init__.py:24: FutureWarning: The aeon package will soon be releasing v1.0.0 with the removal of legacy modules and interfaces such as BaseTransformer and BaseForecaster. This will contain breaking changes. See aeon-toolkit.org for more information. Set aeon.AEON_DEPRECATION_WARNING or the AEON_DEPRECATION_WARNING environmental variable to 'False' to disable this warning.
  warnings.warn(
train: [0 0 0 0 0 0 1 0 0 0]
test : [0 0 0 0 0 1 0 0 0 0]
LOF via PyODAdapter
[1.01230696 1.01230696 1.01230696 1.01230696 1.01230696 0.98562678
 0.95894661 0.98562678 1.01230696 1.01230696]
predicting on test via PyODAdapter
[1.01230696 1.01230696 1.01230696 1.01230696 0.98562678 0.95894661
 0.98562678 1.01230696 1.01230696 1.01230696]
LOF via PyOD
[1.01230696 1.01230696 1.01230696 1.01230696 1.01230696 0.98562678
 0.95894661 0.98562678 1.01230696 1.01230696]
decision_function on test via PyOD
[0.95894661 0.95894661 0.95894661 0.95894661 0.95894661 0.95894661
 0.95894661 0.95894661 0.95894661 0.95894661]

Versions

No response

CodeLionX commented 1 month ago

Thank you for your issue. We are currently discussing this in the team; will let you know of the result.

CodeLionX commented 1 month ago

We decided to change the PyODAdapter to be unsupervised and semi-supervised at the same time, meaning it supports both conventions:

unsupervised: fit_predict(X) on the same input data X
semi-supervised:
- fit(X_train, y) to build the normal behavior model on some data X_train, ignoring y
- predict(X_target) to get the anomaly scores on different data X_target

For the semi-supervised case, most PyOD models can actually deal with somewhat dirty data (non-annotated). So, it does not fit the definition fully.

aeon-toolkit / aeon