beringresearch / ivis

Dimensionality reduction in very large datasets using Siamese Networks
https://beringresearch.github.io/ivis/
Apache License 2.0
330 stars 43 forks source link

`NotFittedError` after caching and reloading fitted `Ivis` instance #101

Closed imatheussm closed 2 years ago

imatheussm commented 3 years ago

The issue

A fitted Ivis instance is not adequately preserved when joblib.dump() is used to save it. Consequently, when Ivis is used as part of a sklearn.pipeline.Pipeline object with memory != None, errors occur.

Minimal reproducible examples

Two examples are provided herein: one with sklearn.pipeline.Pipeline, and other with joblib only (sklearn uses joblib in sklearn.pipeline.Pipeline, so I thought this second example could help).

Environment

A virtual environment was created specifically for this project, wherein all modules specified in requirements.txt were installed. My setup runs an up-to-date version of Windows 10 (no WSL).

Runtime

python=3.9.5

Relevant modules

ivis=2.0.4
tensorflow=2.5.0

Example with sklearn.pipeline.Pipeline

Script

import tempfile
import ivis

from sklearn import datasets, ensemble, model_selection, pipeline, preprocessing, svm

X, y = datasets.load_iris(return_X_y=True)

pipeline_with_ivis = pipeline.Pipeline([
    ("normalize", preprocessing.MinMaxScaler()),
    ("project", None),
    ("classify", None),
], memory=tempfile.mkdtemp())

parameter_grid = {
    "project": (ivis.Ivis(verbose=0),),
    "project__k": (15,),

    "classify": (ensemble.RandomForestClassifier(), svm.SVC()),
    "classify__random_state": (2021,)
}

grid_search = model_selection.GridSearchCV(pipeline_with_ivis, parameter_grid, scoring="accuracy", cv=10, verbose=3,
                                           return_train_score=True).fit(X, y)  # should fail

Log with errors

Fitting 10 folds for each of 2 candidates, totalling 20 fits
[CV 1/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=  11.3s
[CV 2/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=   4.3s
[CV 3/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=   8.6s
[CV 4/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=   3.9s
[CV 5/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.800) total time=   6.4s
[CV 6/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.800) total time=   5.8s
[CV 7/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=1.000) total time=   4.5s
[CV 8/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.800) total time=   5.3s
[CV 9/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.667) total time=   4.3s
[CV 10/10] END classify=RandomForestClassifier(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=1.000, test=0.800) total time=   3.8s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 1/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 2/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 3/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
[CV 4/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
[CV 5/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 6/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
[CV 7/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 8/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 9/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:696: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details: 
Traceback (most recent call last):
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 687, in _score
    scores = scorer(estimator, X_test, y_test)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 199, in __call__
    return self._score(partial(_cached_call, None), estimator, X, y_true,
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 236, in _score
    y_pred = method_caller(estimator, "predict", X)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\metrics\_scorer.py", line 53, in _cached_call
    return getattr(estimator, method)(*args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\utils\metaestimators.py", line 120, in <lambda>
    out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 418, in predict
    Xt = transform.transform(Xt)
  File "<REPOSITORY_ROOT>\ivis\ivis.py", line 331, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.
  warnings.warn(
[CV 10/10] END classify=SVC(), classify__random_state=2021, project=Ivis(verbose=0), project__k=15;, score=(train=nan, test=nan) total time=   0.0s
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_search.py:922: UserWarning: One or more of the test scores are non-finite: [0.88666667        nan]
  warnings.warn(
<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_search.py:922: UserWarning: One or more of the train scores are non-finite: [ 1. nan]
  warnings.warn(

Example without sklearn.pipeline.Pipeline

Script

import ivis
import joblib

from sklearn import datasets, model_selection

X, y = datasets.load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33, random_state=42)
model = ivis.Ivis(k=15, batch_size=15, verbose=0).fit(X_train, y_train)

joblib.dump(model, "ivis.pkl")

new_model = joblib.load("ivis.pkl")

model.transform(X_test)      # should work
new_model.transform(X_test)  # should fail

Log with errors

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "<USER_FOLDER>\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\211.7142.13\plugins\python\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "<USER_FOLDER>\AppData\Local\JetBrains\Toolbox\apps\PyCharm-P\ch-0\211.7142.13\plugins\python\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "<REPOSITORY_ROOT>/playground3.py", line 20, in <module>
    new_model.transform(X_test)  # should fail
  File "<REPOSITORY_ROOT>\venv\lib\site-packages\ivis\ivis.py", line 329, in transform
    raise NotFittedError("Model was not fitted yet. Call `fit` before calling `transform`.")
sklearn.exceptions.NotFittedError: Model was not fitted yet. Call `fit` before calling `transform`.

Discussion

As seen in the example with sklearn.pipeline.Pipeline and sklearn.model_selection.GridSearchCV, everything runs smoothly when Ivis is fitted the first time for all folds. When the model is cached and retrieved for the subsequent runs, however, errors happen because at least Ivis.encoder is missing. Upon experimentation, it was found that even after loading Ivis.encoder, errors happened with the reloaded model, indicating that other important attributes were not properly pickled.

Although I never tested such functions, it seems that saving and loading capabilities were already developed for Ivis in Ivis.save_model() and Ivis.load_model(). However, to ensure that Ivis is pickleable, it would be ideal to transfer and adapt this functionality to Ivis.__getstate__() and Ivis.__setstate__() (the latter of which does not exist AFAIK) so that pickle and joblib know how to pickle an Ivis instance. This would enable its employment in Pipeline objects with memory != None, thus significantly speeding up the hyper-parameter fine-tuning process performed by GridSearchCV.

Szubie commented 3 years ago

Thanks for reporting this.

I will look into making Ivis models serializable using pickle, is sounds like a useful feature if we can do it. Hopefully that will solve the issues you're encountering.

imatheussm commented 3 years ago

I was coding a solution here, which consisted in renaming the currently defined Ivis.__getstate__() to Ivis._get_json() and creating new Ivis.__getstate__() and Ivis.__setstate__() methods. The solution consisted in the following:

However, I stumbled in a problem: when calling Ivis.neighbour_matrix_.save(<path>), or even Ivis.neighbour_matrix_.index.save(<path>), no file was produced, regardless of path type (absolute or relative) or file name. I am currently assuming this is an annoy-related issue.

I am just posting this in case this reasoning is helpful to you.