aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
890 stars 93 forks source link

[BUG] CollectionTransformers with `fit_is_empty` is true #1750

Closed TonyBagnall closed 2 days ago

TonyBagnall commented 2 days ago

Describe the bug

if fit_is_empty is true, isfitted is false and the function returns. This means that reset() is not called. This is in principle fine, because reset only resets variables with the suffix . Note you cannot call transform without first calling fit (that is a separate conversation).

The issue here arises with the self.meta data. Currently this is set in _preprocess_collection, if the current metadata is empty. This means that if you call fit_transform twice, first with equal length then unequal length, it crashes as the metadata is not overwritten. this is simply solved in two ways

  1. reset before fit_is_empty is fit
  2. dont store metadata when calling from transform (e.g. pass boolean)

I prefer (1), because the whole idea was to ultimately all the potential to pass metadata through kwargs

Steps/Code to reproduce the bug

from aeon.transformations.collection.compose._identity import CollectionId
t = CollectionId.create_test_instance()
X, y = make_example_3d_numpy(n_cases=10, n_channels=4, n_timepoints=30)
t.fit(X, y)
t.transform(X)
X2 = t.fit_transform(X, y)

X, y = make_example_3d_numpy_list(
  n_cases=10, n_channels=1, min_n_timepoints=20, max_n_timepoints=30
)

t.fit(X, y)
t.transform(X)
X2 = t.fit_transform(X, y)

it does not overwrite the previous metadata.

Expected results

should work, either not store meta from transform or reset meta in fit

Actual results

  File "C:\Code\aeon\aeon\local\transform_debug.py", line 43, in <module>
    t.transform(X)
  File "C:\Code\aeon\aeon\transformations\collection\base.py", line 154, in transform
    X_inner = self._preprocess_collection(X)
  File "C:\Code\aeon\aeon\base\_base_collection.py", line 83, in _preprocess_collection
    X = self._convert_X(X)
  File "C:\Code\aeon\aeon\base\_base_collection.py", line 208, in _convert_X
    X = convert_collection(X, inner_type)
  File "C:\Code\aeon\aeon\utils\conversion\_convert_collection.py", line 570, in convert_collection
    return convert_dictionary[(input_type, output_type)](X)
  File "C:\Code\aeon\aeon\utils\conversion\_convert_collection.py", line 178, in _from_np_list_to_numpy3d
    raise TypeError("Cannot convert unequal length to numpy3D")
TypeError: Cannot convert unequal length to numpy3D

Versions

No response

MatthewMiddlehurst commented 2 days ago

See changes to base in #1479

TonyBagnall commented 2 days ago

fixed with option 2 in #1479, matthew ahead of me