aeon-toolkit / aeon

A toolkit for machine learning from time series
https://aeon-toolkit.org/
BSD 3-Clause "New" or "Revised" License
947 stars 108 forks source link

[BUG] OS Error when trying to use personalized callback with the InceptionTimeClassifier #1098

Closed juanCantz closed 1 month ago

juanCantz commented 7 months ago

Describe the bug

Hello all, using the available classifiers (like the InceptionTimeClassifier), I want to keep track of the loss during training and I am having issues doing that. The objective is to track loss like in the picture: Figure 8

I am trying to use keras callbacks but it throws an error, and I can not access history since there is no history available according to docs https://www.aeon-toolkit.org/en/latest/api_reference/auto_generated/aeon.classification.deep_learning.InceptionTimeClassifier.html#aeon.classification.deep_learning.InceptionTimeClassifier

The error and code are shared below.

(The question of accessing the training history was solved by Ali El Hadi Ismail Fawaz and Tony Bagnall in the aeon slack (thanks for that), just putting up the issue as asked to fix the OSError exception)

The answer to the question was: So to access the history of each Individual Inception trained models. You will have to do the following, for instance to get the history of the first inception model trained in your ensemble.

my_history = my_inc_time.classifiers_[0].history.history
loss = my_history["loss"]

Steps/Code to reproduce the bug

import os

import tensorflow as tf
from tensorflow import keras
from keras import backend as K
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras.callbacks import EarlyStopping

from aeon.classification.deep_learning import FCNClassifier, InceptionTimeClassifier, IndividualInceptionClassifier
from aeon.clustering.deep_learning import AEFCNClusterer
from aeon.datasets import load_classification, load_regression
from aeon.networks import InceptionNetwork

xtrain, ytrain = load_classification(name="ArrowHead", split="train")
xtest, ytest = load_classification(name="ArrowHead", split="test")

ES = keras.callbacks.EarlyStopping(
    monitor='loss',
    min_delta=0.01,
    patience=10,
    verbose=1,
    mode='auto',
    baseline=None,
    restore_best_weights=False,
    start_from_epoch=0
)

inc = InceptionTimeClassifier(n_classifiers=1, use_custom_filters=False, n_epochs=5, callbacks=ES, file_path="./", save_last_model=True)

history = inc.fit(X=xtrain, y=ytrain)
ypred = inc.predict(X=xtest)

print("Predictions: ", ypred)
print("Ground Truth: ", ytest)

Expected results

Successful training with application of the described callback function.

Actual results

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[4], line 17
      4 ES = keras.callbacks.EarlyStopping(
      5     monitor='loss',
      6     min_delta=0.01,
   (...)
     12     start_from_epoch=0
     13 )
     15 inc = InceptionTimeClassifier(n_classifiers=1, use_custom_filters=False, n_epochs=5, callbacks=ES, file_path="./", save_last_model=True)
---> 17 history = inc.fit(X=xtrain, y=ytrain)
     18 ypred = inc.predict(X=xtest)
     20 print("Predictions: ", ypred)

File ~/work/.env/lib/python3.11/site-packages/aeon/classification/base.py:156, in BaseClassifier.fit(self, X, y)
    154     self._is_fitted = True
    155     return self
--> 156 self._fit(X, y)
    157 self.fit_time_ = int(round(time.time() * 1000)) - start
    158 # this should happen last

File ~/work/.env/lib/python3.11/site-packages/aeon/classification/deep_learning/_inception_time.py:280, in InceptionTimeClassifier._fit(self, X, y)
    249 for n in range(0, self.n_classifiers):
    250     cls = IndividualInceptionClassifier(
    251         nb_filters=self.nb_filters,
    252         nb_conv_per_layer=self.nb_conv_per_layer,
   (...)
    278         verbose=self.verbose,
    279     )
--> 280     cls.fit(X, y)
    281     self.classifers_.append(cls)
    282     gc.collect()

File ~/work/.env/lib/python3.11/site-packages/aeon/classification/base.py:156, in BaseClassifier.fit(self, X, y)
    154     self._is_fitted = True
    155     return self
--> 156 self._fit(X, y)
    157 self.fit_time_ = int(round(time.time() * 1000)) - start
    158 # this should happen last

File ~/work/.env/lib/python3.11/site-packages/aeon/classification/deep_learning/_inception_time.py:675, in IndividualInceptionClassifier._fit(self, X, y)
    665 self.history = self.training_model_.fit(
    666     X,
    667     y_onehot,
   (...)
    671     callbacks=self.callbacks_,
    672 )
    674 try:
--> 675     self.model_ = tf.keras.models.load_model(
    676         self.file_path + self.file_name_ + ".hdf5", compile=False
    677     )
    678     if not self.save_best_model:
    679         os.remove(self.file_path + self.file_name_ + ".hdf5")

File ~/work/.env/lib/python3.11/site-packages/keras/src/saving/saving_api.py:262, in load_model(filepath, custom_objects, compile, safe_mode, **kwargs)
    254     return saving_lib.load_model(
    255         filepath,
    256         custom_objects=custom_objects,
    257         compile=compile,
    258         safe_mode=safe_mode,
    259     )
    261 # Legacy case.
--> 262 return legacy_sm_saving_lib.load_model(
    263     filepath, custom_objects=custom_objects, compile=compile, **kwargs
    264 )

File ~/work/.env/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback.<locals>.error_handler(*args, **kwargs)
     67     filtered_tb = _process_traceback_frames(e.__traceback__)
     68     # To get the full stack trace, call:
     69     # `tf.debugging.disable_traceback_filtering()`
---> 70     raise e.with_traceback(filtered_tb) from None
     71 finally:
     72     del filtered_tb

File ~/work/.env/lib/python3.11/site-packages/keras/src/saving/legacy/save.py:234, in load_model(filepath, custom_objects, compile, options)
    232 if isinstance(filepath_str, str):
    233     if not tf.io.gfile.exists(filepath_str):
--> 234         raise IOError(
    235             f"No file or directory found at {filepath_str}"
    236         )
    238     if tf.io.gfile.isdir(filepath_str):
    239         return saved_model_load.load(
    240             filepath_str, compile, options
    241         )

OSError: No file or directory found at ./1706182933454589005.hdf5

Versions

System: python: 3.11.6 | packaged by conda-forge | (main, Oct 3 2023, 10:40:35) [GCC 12.3.0] executable: /home/jovyan/work/.env/bin/python machine: Linux-6.5.0-14-generic-x86_64-with-glibc2.35 Python dependencies: pip: 23.3 setuptools: 68.2.2 scikit-learn: 1.3.1 aeon: 0.6.0 statsmodels: 0.14.0 numpy: 1.24.4 scipy: 1.11.3 pandas: 2.0.3 matplotlib: 3.8.0 joblib: 1.3.2 numba: 0.57.1 pmdarima: None tsfresh: None
MatthewMiddlehurst commented 3 months ago

Sorry, this kind of got lost during time busy bits. @hadifawaz1999 is this related to the saving and loading issues we were having?

hadifawaz1999 commented 3 months ago

Sorry, this kind of got lost during time busy bits. @hadifawaz1999 is this related to the saving and loading issues we were having?

no it isnt its just about always adding the modelcheckpoint callback by default, forgot to work on that, can take care of it

TonyBagnall commented 2 months ago

is this a bug or a feature request?

hadifawaz1999 commented 2 months ago

is this a bug or a feature request?

it is a bug yes, i should fix it but still havent done it yet

hadifawaz1999 commented 2 months ago

but a bug in very specific case

hadifawaz1999 commented 2 months ago

that can be simply avoided by passing the modelcheckpoint manually by the user, just should have it by default always internal