chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

textacy.lang_utils.identify_lang throws IndexError: pop from empty list #291

Closed Dabbrivia closed 4 years ago

Dabbrivia commented 4 years ago

steps to reproduce

  1. do this install Ubuntu LTS 18.04 on WSL Windows 10, install textacy into a virtualenv with python3 cd /mnt/c/venvs/ virtualenv -p python3 eb_ts source /mnt/c/venvs/eb_ts/bin/activate
  2. then this run python3

    import textacy textacy.lang_utils.identify_lang('Was ist das denn?')

expected vs. actual behavior

the /mnt/c/venvs/eb_ts/lib/python3.6/site-packages/textacy/data/lang_identifier/lang-identifier-v1.1-sklearn-v0.22.pkl.gz should be downloaded completely and the language be determined as "de"

The download seems to fail at 22% 22%|████████████████████████▉ | 9.00/40.0 [00:00<00:00, 402B/s] pop from empty list Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import textacy textacy.lang_utils.identify_lang('Was ist das denn?') Traceback (most recent call last): File "", line 1, in File "/mnt/c/venvs/eb_ts/lib/python3.6/site-packages/textacy/lang_utils.py", line 156, in identifylang lang = self.pipeline.predict(text).item() File "/mnt/c/venvs/eb_ts/lib/python3.6/site-packages/textacy/lang_utils.py", line 100, in pipeline self._pipeline = self._load_pipeline() File "/mnt/c/venvs/eb_ts/lib/python3.6/site-packages/textacy/lang_utils.py", line 108, in _load_pipeline pipeline = joblib.load(f) File "/mnt/c/venvs/eb_ts/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 588, in load obj = _unpickle(fobj) File "/mnt/c/venvs/eb_ts/lib/python3.6/site-packages/joblib/numpy_pickle.py", line 526, in _unpickle obj = unpickler.load() File "/usr/lib/python3.6/pickle.py", line 1050, in load dispatchkey[0] File "/usr/lib/python3.6/pickle.py", line 1315, in load_obj args = self.pop_mark() File "/usr/lib/python3.6/pickle.py", line 1057, in pop_mark self.stack = self.metastack.pop() IndexError: pop from empty list

possible solution?

context

autodetect language, it worked before

environment

while trying to run I got this textacy.utils.print_markdown(textacy.utils.get_config()) python Python 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import textacy textacy.utils.print_markdown(textacy.utils.get_config()) Traceback (most recent call last): File "", line 1, in File "/mnt/c/venvs/eb_ts/lib/python3.6/site-packages/textacy/utils.py", line 64, in print_markdown print("{}".format("\n".join(md_items))) File "/mnt/c/venvs/eb_ts/lib/python3.6/site-packages/textacy/utils.py", line 62, in for k, v in items File "/mnt/c/venvs/eb_ts/lib/python3.6/site-packages/textacy/utils.py", line 147, in to_unicode raise TypeError("s must be {}, not {}".format((str, bytes), type(s))) TypeError: s must be (<class 'str'>, <class 'bytes'>), not <class 'list'>

bdewilde commented 4 years ago

Hi @Dabbrivia , this issue was discussed — and probably resolved — over here: #292. So, I'm closing this issue as a duplicate.