Sotera / webpageclassifier

Categorizes a website given URL into one of blog|wiki|news|forum|classified|shopping|undecided.
Apache License 2.0
8 stars 3 forks source link

error while executing script #21

Open ghost opened 4 years ago

ghost commented 4 years ago

Hi,

Hope you are all well !

while executing the script python webpageclassifier.py, I have the following error:

scikit-learn version: 0.20.4

/app/webpageclassifier# python webpageclassifier.py
Traceback (most recent call last):
  File "webpageclassifier.py", line 11, in <module>
    from sklearn.base import BaseEstimator, TransformerMixin
  File "/usr/local/lib/python3.8/site-packages/sklearn/__init__.py", line 64, in <module>
    from .base import clone
  File "/usr/local/lib/python3.8/site-packages/sklearn/base.py", line 14, in <module>
    from .utils.fixes import signature
  File "/usr/local/lib/python3.8/site-packages/sklearn/utils/__init__.py", line 14, in <module>
    from . import _joblib
  File "/usr/local/lib/python3.8/site-packages/sklearn/utils/_joblib.py", line 22, in <module>
    from ..externals import joblib
  File "/usr/local/lib/python3.8/site-packages/sklearn/externals/joblib/__init__.py", line 119, in <module>
    from .parallel import Parallel
  File "/usr/local/lib/python3.8/site-packages/sklearn/externals/joblib/parallel.py", line 28, in <module>
    from ._parallel_backends import (FallbackToBackend, MultiprocessingBackend,
  File "/usr/local/lib/python3.8/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 22, in <module>
    from .executor import get_memmapping_executor
  File "/usr/local/lib/python3.8/site-packages/sklearn/externals/joblib/executor.py", line 14, in <module>
    from .externals.loky.reusable_executor import get_reusable_executor
  File "/usr/local/lib/python3.8/site-packages/sklearn/externals/joblib/externals/loky/__init__.py", line 12, in <module>
    from .backend.reduction import set_loky_pickler
  File "/usr/local/lib/python3.8/site-packages/sklearn/externals/joblib/externals/loky/backend/reduction.py", line 125, in <module>
    from sklearn.externals.joblib.externals import cloudpickle  # noqa: F401
  File "/usr/local/lib/python3.8/site-packages/sklearn/externals/joblib/externals/cloudpickle/__init__.py", line 3, in <module>
    from .cloudpickle import *
  File "/usr/local/lib/python3.8/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py", line 152, in <module>
    _cell_set_template_code = _make_cell_set_template_code()
  File "/usr/local/lib/python3.8/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py", line 133, in _make_cell_set_template_code
    return types.CodeType(
TypeError: an integer is required (got type bytes

with newer version (scikit-learn-0.23.0) I have an error like the following one:

/app/webpageclassifier# python webpageclassifier.py
Traceback (most recent call last):
  File "webpageclassifier.py", line 12, in <module>
    from sklearn.externals.joblib.parallel import cpu_count, Parallel, delayed
ModuleNotFoundError: No module named 'sklearn.externals.joblib'

It would be amazing to fix has I wanted to classify the dmoz/odp directory with your script. Is it possible to fix/update it ? How can I had more categories ?

Thanks in advance for any inputs or insights about these questions.

Cheers, X