axa-group / Parsr

Transforms PDF, Documents and Images into Enriched Structured Data
Apache License 2.0
5.81k stars 310 forks source link

No module named 'sklearn.feature_selection.rfe' for HeadingLevelPrediction #572

Open ajaykumarbharaj opened 2 years ago

ajaykumarbharaj commented 2 years ago

Summary Processing a document throws this error No module named 'sklearn.feature_selection.rfe'. loading levels_model.pkl in HeadingLevelPrediction.py is causing the error

slbayer commented 2 years ago

The problem is that the model was built with an old version of sklearn that had this module. According to the warning I get after doing several horrid things with imports, it reports that the version of sklearn that was used to pickle the model was 0.21.3. Locally, I installed using the bare metal installation instructions, which omitted sklearn as a dependency. pip install scikit-learn==0.21.3 installs a binary on my MacOS Big Sur machine for Python 3.7.9, but not for 3.8.2 or 3.9.4 (not using homebrew).

Here's the horrid thing that I did with the imports, in dist/assets/HeadingLevelPrediction.py, after I installed the most recent version of sklearn:

import sklearn.feature_selection._rfe
sys.modules["sklearn.feature_selection.rfe"] = sklearn.feature_selection._rfe
import sklearn.tree
import sklearn.tree._tree
sklearn.tree.tree = sklearn.tree._tree
sys.modules["sklearn.tree.tree"] = sklearn.tree._tree
sklearn.tree.tree.DecisionTreeClassifier = sklearn.tree.DecisionTreeClassifier
import sklearn.metrics._scorer
import sklearn.metrics._classification
sys.modules["sklearn.metrics.scorer"] = sklearn.metrics._scorer
sys.modules["sklearn.metrics.classification"] = sklearn.metrics._classification

At that point, the model loaded, but I I got a warning that it had been pickled with a previous version, and I don't know whether it still worked. The better part of valor is just to use Python 3.7, for me.

manoj-kore commented 1 year ago

@slbayer Thanks for the answer! This saved my day!!