LexPredict / lexpredict-lexnlp

LexNLP by LexPredict
GNU Affero General Public License v3.0
703 stars 179 forks source link

Install failure for 2.1.0 and 2.2.0 on Win10/Python 3.9 due to sklearn==0.23.1 #62

Open CaseGuide opened 2 years ago

CaseGuide commented 2 years ago

Attempting to pip install lexnlp currently pulls 2.1.0 from pypi. This fails to install on Win10/Python 3.9 and apparently M1 MacBooks. Downloading the current master and installing from zip encounters similar issues.

The issue is scikit learn version 0.23.1 failing to install due to changes made in numpy, resulting in the below error even when a sufficient numpy is installed.

Importing the numpy c-extensions failed.
[...]
      ImportError: numpy is not installed.
      scikit-learn requires numpy >= 1.13.3.
      Installation instructions are available on the scikit-learn website: http://scikit-learn.org/stable/install.html

Was able to workaround and run two test examples in the docs, but havent fully tested, by installing current master with requirements set to the following in setup.py

...
python_requires='>=3.6',
...
        'cloudpickle==2.1.0',
        'dateparser==1.1.1',
        'gensim==4.1.2',
        'joblib==1.1.0',
        'nltk==3.7',
        'num2words==0.5.10',
        'numpy>=1.13.1',
        'pandas>=1.1.5',
        'pycountry==22.3.5',
        'regex==2022.3.2',
        'reporters-db==3.2.18',
        'requests==2.27.1',
        'scipy==1.8.1',
        'scikit-learn==0.24.2',
        'tzlocal==2.1',
        'tqdm>=4.36.0',
        'Unidecode==1.3.4',
        'us==2.0.2',
        'zahlwort2num==0.3.0'

Can I suggest using less rigid requirements? This package is often going to be use as part of a workflow, and rigidly pinning not only causes install issues when those deps start to age (sklearn 0.23.1 is 2 years old) but it also unnecessarily forces your package to be the driver of install requirements for the system its a part of.

EDIT: This doesnt work as there are breaking changes from sklearn 0.23.1 -> 0.24, in particular when loading the pickle from addresses.py sklearn 0.24 throws the error: ModuleNotFoundError: No module named 'sklearn.tree.tree'

JSv4 commented 2 years ago

I'm having similar issues. A related problem is LexNlp doesn't play nice with other nlp packages due to the old versions of numpy required by all but the newest LexNLP releases. The newer numpy version in latest LexNLP release is compatible with far more other packages, but I can't get it to install due to the sklearn==0.23.1 dependency, as noted by @CaseGuide.

I should add that I'm on Ubuntu 20.04