DerwenAI / pytextrank

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
https://derwen.ai/docs/ptr/
MIT License
2.15k stars 333 forks source link

"ValueError: [E002] Can't find factory for 'textrank' for language English (en)." - incompatibility with SpaCy 3.3.1? #220

Closed ghost closed 1 year ago

ghost commented 2 years ago

I'm trying to use this package for the first time and followed the README:

!pip install pytextrank
!python -m spacy download en_core_web_sm
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("textrank")

This throws an error at the last line:

ValueError: [E002] Can't find factory for 'textrank' for language English (en). This usually happens when spaCy calls 'nlp.create_pipe' with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator '@Language.component' (for function components) or '@Language.factory' (for class components).

Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, doc_cleaner, parser, beam_parser, lemmatizer, trainable_lemmatizer, entity_linker, ner, beam_ner, entity_ruler, tagger, morphologizer, senter, sentencizer, textcat, spancat, future_entity_ruler, span_ruler, textcat_multilabel, en.lemmatizer`

Is this an incompatibility with SpaCy version 3.3.1 or have I overseen something crucial? Which SpaCy version do you recommend? (I restarted the kernel after installing pytextrank)

ceteri commented 2 years ago

Hi @lisabecker-ml6 , that's >should< be working well, although we'll check the other system dependencies.

First though, here's a session that starts clean with Py 3.8 on macOS, installs spaCy 3.3.1, then runs pytextrank:

(base) $ python3 -m venv venv
(base) $ source venv/bin/activate
(venv) (base) $ python3
Python 3.8.10 (v3.8.10:3d8993a744, May  3 2021, 08:55:58) 
>>> ^D

(venv) (base) $ python3 -m pip install -U pip wheel
Requirement already satisfied: pip in ./venv/lib/python3.8/site-packages (21.1.1)
Collecting pip
  Downloading pip-22.2.1-py3-none-any.whl (2.0 MB)
[ ... ]
Successfully installed pip-22.2.1 wheel-0.37.1

(venv) (base) $ python3 -m pip install spacy==3.3.1
Collecting spacy==3.3.1
  Downloading spacy-3.3.1-cp38-cp38-macosx_10_9_x86_64.whl (6.5 MB)
[ ... ]
✔ Download and installation successful
You can now load the package via spacy.load('en_core_web_sm')

(venv) (base) $ python3 -m pip install -U pytextrank
Collecting pytextrank
  Downloading pytextrank-3.2.3-py3-none-any.whl (30 kB)
[ ... ]
Successfully installed asttokens-2.0.5 colorama-0.4.5 cycler-0.11.0 executing-0.9.1 fonttools-4.34.4 graphviz-0.20.1 icecream-2.1.3 kiwisolver-1.4.4 matplotlib-3.5.2 networkx-2.8.5 pandas-1.4.3 pillow-9.2.0 pygments-2.12.0 pytextrank-3.2.3 python-dateutil-2.8.2 pytz-2022.1 scipy-1.8.1 six-1.16.0

(venv) (base) $ python3
Python 3.8.10 (v3.8.10:3d8993a744, May  3 2021, 08:55:58)
>>> import spacy
>>> import pytextrank
>>> nlp = spacy.load("en_core_web_sm")
>>> nlp.add_pipe("textrank")
<pytextrank.base.BaseTextRankFactory object at 0x7ff89edad760>
>>> text = "Compatibility of systems of linear constraints over the set of natural numbers."
>>> doc = nlp(text)
>>> for phrase in doc._.phrases:
...   print(phrase)
... 
Phrase(text='natural numbers', chunks=[natural numbers], count=1, rank=0.33223519667364926)
Phrase(text='linear constraints', chunks=[linear constraints], count=1, rank=0.22279697495861164)
Phrase(text='systems', chunks=[systems], count=1, rank=0.13804851507607008)
Phrase(text='Compatibility', chunks=[Compatibility], count=1, rank=0.12186027597311673)
Phrase(text='the set', chunks=[the set], count=1, rank=0.08772274252666795)

Depending on the system environment, installing with pip inside of a Jupyter notebook can have some strange side effects – there's an entry on the spaCy forums where Sophie and I were troubleshooting a Python environment in a kernel in JupyterLab which logically never should have happened. It kept giving errors from a much earlier version of spaCy than what was being installed. Turns out that Jupyter was a bit sloppy about which Py environment it uses as a base for the Py kernel, and it wasn't following the environment variable settings.

When working in Jupyter, it can help for troubleshooting to get more system info:

import sys
print(sys.executable)

For example, the Python executable may be different than what's expected.

Based on the error message and the available factories, it looks like pytextrank isn't being installed in the same environment as the Python executable that the Jupyter kernel is running.

Alternatively, could you please try using a venv instead of a notebook, with the commands shown above? That's one quick way to isolate whether there's a Py executable problem with Jupyter.

We'll get this fixed! :)