TeamHG-Memex / sklearn-crfsuite

scikit-learn inspired API for CRFsuite
426 stars 215 forks source link

repr CRF causes AttributeError in scikit-learn 0.24.0 (or later) #60

Open ftnext opened 3 years ago

ftnext commented 3 years ago

Environment

macOS Python 3.8.6

Procedure

The tutorial code: https://sklearn-crfsuite.readthedocs.io/en/latest/tutorial.html#training

>>> import sklearn_crfsuite
>>> crf = sklearn_crfsuite.CRF(
...     algorithm='lbfgs',
...     c1=0.1,
...     c2=0.1,
...     max_iterations=100,
...     all_possible_transitions=True
... )
>>> repr(crf)

As is

Raises AttributeError

'CRF' object has no attribute 'keep_tempfiles'

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../venv/lib/python3.8/site-packages/sklearn/base.py", line 260, in __repr__
    repr_ = pp.pformat(self)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pprint.py", line 153, in pformat
    self._format(object, sio, 0, 0, {}, 0)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pprint.py", line 170, in _format
    rep = self._repr(object, context, level)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/pprint.py", line 404, in _repr
    repr, readable, recursive = self.format(object, context.copy(),
  File "/.../venv/lib/python3.8/site-packages/sklearn/utils/_pprint.py", line 180, in format
    return _safe_repr(object, context, maxlevels, level,
  File "/.../venv/lib/python3.8/site-packages/sklearn/utils/_pprint.py", line 425, in _safe_repr
    params = _changed_params(object)
  File "/.../venv/lib/python3.8/site-packages/sklearn/utils/_pprint.py", line 91, in _changed_params
    params = estimator.get_params(deep=False)
  File "/.../venv/lib/python3.8/site-packages/sklearn/base.py", line 195, in get_params
    value = getattr(self, key)
AttributeError: 'CRF' object has no attribute 'keep_tempfiles'

Expected

No error

Workaround

Use scikit-learn under 0.24

$ pip install -U 'scikit-learn<0.24'

In my case, scikit-learn 0.23.2 installed.

>>> import sklearn_crfsuite
>>> crf = sklearn_crfsuite.CRF(
...     algorithm='lbfgs',
...     c1=0.1,
...     c2=0.1,
...     max_iterations=100,
...     all_possible_transitions=True
... )
>>> repr(crf)
"CRF(algorithm='lbfgs', all_possible_transitions=True, c1=0.1, c2=0.1,\n    keep_tempfiles=None, max_iterations=100)"

Related Information

repr(crf) shows the following warning in scikit-learn<0.24:

/.../venv/lib/python3.8/site-packages/sklearn/base.py:209: FutureWarning: From version 0.24, get_params will raise an AttributeError if a parameter cannot be retrieved as an instance attribute. Previously it would return None.

Change log in scikit-learn 0.24

https://scikit-learn.org/dev/whats_new/v0.24.html#sklearn-base

base.BaseEstimator.get_params now will raise an AttributeError if a parameter cannot be retrieved as an instance attribute.

The CRF class seems to inherit sklearn.base.BaseEstimator (when it can be imported) and the values returned by the classmethod _get_param_names() includes the attributes like keep_tempfiles, which are not defined in CRF class (code).

>>> sklearn_crfsuite.CRF._get_param_names()
['algorithm', 'all_possible_states', 'all_possible_transitions', 'averaging', 'c', 'c1', 'c2', 'calibration_candidates', 'calibration_eta', 'calibration_max_trials', 'calibration_rate', 'calibration_samples', 'delta', 'epsilon', 'error_sensitive', 'gamma', 'keep_tempfiles', 'linesearch', 'max_iterations', 'max_linesearch', 'min_freq', 'model_filename', 'num_memories', 'pa_type', 'period', 'trainer_cls', 'variance', 'verbose']
SethPoulsen commented 2 years ago

The workaround described here does not work for me with Python 3.10.

I have tried asking pip to install sklearn<0.24, and even if I specify --prefer-binaries, it will still attempt to build sklearn from source, then fail when trying to build a dependency (numpy) from source.

I'm guessing that the (probably old) version of numpy required for sklearn<0.24 is not compatible with python 3.10 for whatever reason.

edit: reference on old numpy not working with Python 3.10 https://github.com/numpy/numpy/issues/19033

SethPoulsen commented 2 years ago

I also get the same error when using some of sklearn's model selection features, specifically sklearn.model_selection.RandomizedSearchCV and sklearn.model_selection.GridSearchCV

pratikchhapolika commented 2 years ago

How to fix?

SethPoulsen commented 2 years ago

Look at the pull request I opened: #69

zhanghaok commented 1 year ago
ta

hello, how can I get the training loss?

doctor-entropy commented 1 year ago

I have a workaround, Go to the sklearn_crfsuite repository in your local machine (It will be in python3.*/site-packages/sklearn_crfsuite and in the estimator.py file, comment out lines: 238 - model_filename=None 239 - keep_tempfiles=False 268 - filename=model_filesname 269 - keep_tempfiles=keep_tempfiles

This worked on scikit-learn = "^1.2.0"

Edit: Even better, Simply install this updated repo https://github.com/MeMartijn/updated-sklearn-crfsuite instead of sklearn-crfsuite which has the fix. Installation instructions - https://github.com/TeamHG-Memex/sklearn-crfsuite/pull/67#issuecomment-1167519941

Yousif-Ahmed commented 1 year ago

I face this problem with scikit_learn-1.2.2 tried to pip install -U 'scikit-learn<0.24' but it doesn't work with python3.10 Does anyone have a workaround?

HarryCaveMan commented 1 year ago

I have submitted the following PR to fix this: https://github.com/TeamHG-Memex/sklearn-crfsuite/pull/74

doctor-entropy commented 1 year ago

I have submitted the following PR to fix this: #74

Unfortunately, This repo is no longer maintained and not accepting pull requests. Their last commits were many years old.

Gallaecio commented 3 months ago

For the record, 0.4.0 is out addressing this issue. Maintenance of the package will continue at https://github.com/scrapinghub/sklearn-crfsuite.