dimitrismistriotis / alt-profanity-check

A fast, robust library to check for offensive language in strings, dropdown replacement of "profanity-check".
https://pypi.org/project/alt-profanity-check/
MIT License
69 stars 16 forks source link

ImportError: cannot import name 'joblib' from 'sklearn.externals' #35

Closed felixgao-0 closed 9 months ago

felixgao-0 commented 9 months ago

Hi yall! Trying to use this package but ran into this error. Any help is appreciated!

Error:

Traceback (most recent call last):
  File "main.py", line 1, in <module>
    from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals' (/home/runner/profanity-check-playground/.pythonlibs/lib/python3.8/site-packages/sklearn/externals/__init__.py)

Code: (Code from the in-browser example which links to a repl)

from profanity_check import predict, predict_prob

print(predict([
  'predict() takes an array and a 1 for each string if it is offensive, else 0.',
  'fuck you',
]))

print(predict_prob([
  'predict_prob() takes an array and returns the probability each string is offensive',
  'go to hell, you scum',
]))
dimitrismistriotis commented 9 months ago

Thanks for the interest in the library.

As I saw from the runner above you use Python3.8:

ImportError: cannot import name 'joblib' from 'sklearn.externals' (/home/runner/profanity-check-playground/.pythonlibs/lib/β†’python3.8←/site-packages/sklearn/externals/init.py)

With that in mind in a new shell:

image

Copy/Paste of commands run:

Welcome to fish, the friendly interactive shell
Type help for instructions on how to use fish
πŸš€ ~ python3.8                                                          09:31:04
Python 3.8.18 (default, Aug 24 2023, 19:48:18) 
[GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
πŸš€ ~ mkdir alt-3.8-check                                                09:33:32
πŸš€ ~ cd alt-3.8-check/                                                  09:36:18
πŸš€ ~/alt-3.8-check python3.8 -m venv .venv                              09:36:20
πŸš€ ~/alt-3.8-check source .venv/bin/activate.fish                                                                                                         09:36:32
(.venv) πŸš€ ~/alt-3.8-check (.venv) which python                                                                                                           09:36:39
/home/dimitrios/alt-3.8-check/.venv/bin/python
(.venv) πŸš€ ~/alt-3.8-check (.venv) python --version                                                                                                       09:36:44
Python 3.8.18
(.venv) πŸš€ ~/alt-3.8-check (.venv) pip install alt-profanity-check                                                                                        09:36:49
Collecting alt-profanity-check
  Downloading alt-profanity-check-1.3.2.tar.gz (1.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.9/1.9 MB 9.0 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting scikit-learn==1.3.2
  Downloading scikit_learn-1.3.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.1/11.1 MB 11.2 MB/s eta 0:00:00
Collecting joblib>=1.3.2
  Using cached joblib-1.3.2-py3-none-any.whl (302 kB)
Collecting scipy>=1.5.0
  Downloading scipy-1.10.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 34.5/34.5 MB 10.5 MB/s eta 0:00:00
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-3.3.0-py3-none-any.whl (17 kB)
Collecting numpy<2.0,>=1.17.3
  Downloading numpy-1.24.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17.3/17.3 MB 7.6 MB/s eta 0:00:00
Installing collected packages: threadpoolctl, numpy, joblib, scipy, scikit-learn, alt-profanity-check
  DEPRECATION: alt-profanity-check is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559
  Running setup.py install for alt-profanity-check ... done
Successfully installed alt-profanity-check-1.3.2 joblib-1.3.2 numpy-1.24.4 scikit-learn-1.3.2 scipy-1.10.1 threadpoolctl-3.3.0

[notice] A new release of pip is available: 23.0.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
(.venv) πŸš€ ~/alt-3.8-check (.venv) python                                                                                                                 09:37:14
Python 3.8.18 (default, Aug 24 2023, 19:48:18) 
[GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from profanity_check import predict, predict_prob
>>> predict([
...   'predict() takes an array and a 1 for each string if it is offensive, else 0.',
...   'fuck you',
... ])
array([0, 1])
>>> predict_prob([
...   'predict_prob() takes an array and returns the probability each string is offensive',
...   'go to hell, you scum',
... ])
array([0.03944098, 0.99780641])
>>> 
(.venv) πŸš€ ~/alt-3.8-check (.venv)                                                                                                                        09:41:08

With that I would think that there is something wrong in your setup. I am not 100% sure on how to help from here but maybe you could trace the steps you did to run `alt-profanity-check'?

felixgao-0 commented 9 months ago

Hi! Thanks for the response!

I was originally using replit to run the package. So as a troubleshooting measure I moved to PyCharm. Running PyCharm with python 3.9 and 3.8 appears to work as intended without additional modification (With the original code I used before, no shell scripts other than pip install...). Seems like the issue is with replit.

In looking at the package files (Specifically profanity_check/profanity_check.py) I noticed a difference between the 2 files even though they run the same version (1.3.2).

Replit:

import pkg_resources
import numpy as np
from sklearn.externals import joblib

vectorizer = joblib.load(pkg_resources.resource_filename('profanity_check', 'data/vectorizer.joblib'))
model = joblib.load(pkg_resources.resource_filename('profanity_check', 'data/model.joblib'))

def _get_profane_prob(prob):
  return prob[1]

def predict(texts):
  return model.predict(vectorizer.transform(texts))

def predict_prob(texts):
  return np.apply_along_axis(_get_profane_prob, 1, model.predict_proba(vectorizer.transform(texts)))

PyCharm: (Aligns with the file on github)

"""Profanity check exposed methods"""
import pkg_resources
import numpy as np
import joblib

vectorizer = joblib.load(
    pkg_resources.resource_filename("profanity_check", "data/vectorizer.joblib")
)
model = joblib.load(
    pkg_resources.resource_filename("profanity_check", "data/model.joblib")
)

def _get_profane_prob(prob):
    return prob[1]

def predict(texts):
    """Predict texts array"""
    return model.predict(vectorizer.transform(texts))

def predict_prob(texts):
    """Predict texts array returning probabilities"""
    return np.apply_along_axis(
        _get_profane_prob, 1, model.predict_proba(vectorizer.transform(texts))
    )

I noticed that joblib is imported differently between the 2 versions (Perhaps the cause of my past error?). I don't know much about packages and stuff but replit runs off linux and I ran PyCharm on Mac os. Is it possible that different package versions are sent with different operating systems?

dimitrismistriotis commented 9 months ago

I think I can root-cause this.

To the best of my knowledge Repl imports the original library as it deduces it from the name of the library.

When we did a drop-in replacement we decided in order for our fork to be as much "drop-in" as possible to import as in the README with:

from profanity_check import predict, predict_prob

or to say it in another way we do not define an "alt_profanity_check" so that you can write from alt_profanity_check ....

As we have heard - there is a fork someone did to address this - Repl does not like it and thinks it is the original package. Probably the way you wrote it mixed imports from different versions of joblib. If you look at the code you provided and compare it with https://github.com/vzhou842/profanity-check/blob/master/profanity_check/profanity_check.py as it currently is, you can see that this is from the original package while ours is formatted hence the differences.

I believe one way to go from here is to discuss this with that platform's support and direct them here for reference if you still want to use their solution for what you are doing.

felixgao-0 commented 9 months ago

I don't think replit is accidently using profanity-check rather than alt-profanity-check as the file structures match this github page (The original profanity-check doesn't have a data folder or a command-line.py. Additionally the original profanity_check.py doesn't import joblib the way the replit seems to). I'll continue checking to see if maybe it is somehow.

I've made a forum support page already and I've linked them here.

felixgao-0 commented 9 months ago

I've downloaded alt-profanity-check directly from pip. After uploading that to replit and installing from that file, the code works as intended. Appears that replit was, indeed, mixing and matching files.

Thanks for all the help and the suggestion that its using another file! I'll mark this as resolved and cancel my support ticker with replit.

dimitrismistriotis commented 9 months ago

I believe that Replit should be aware of the situation we have seen affects some people. My suggestion is to point them here and if anyone from their support is reading it, we are happy to help.