Closed kaiyungtan closed 1 year ago
Hi @kaiyungtan ,
thanks for checking out pytextrank
.
Given the error message you gave us, I can tell that textrank
is missing from the Available factories: ...
list.
This means that you're probably missing an import. In particular, don't forget this line
import pytextrank
In the docs https://derwen.ai/docs/ptr/explain_summ/ you can find the reproducible snippet of code, which I will add here for reference:
import spacy
nlp = spacy.load("en_core_web_sm")
text = "Compatibility of systems of linear constraints over the set of natural numbers. Criteria of compatibility of a system of linear Diophantine equations, strict inequations, and nonstrict inequations are considered. Upper bounds for components of a minimal set of solutions and algorithms of construction of minimal generating sets of solutions for all types of systems are given. These criteria and the corresponding algorithms for constructing a minimal supporting set of solutions can be used in solving all the considered types systems and systems of mixed types."
import pytextrank
nlp.add_pipe("textrank", last=True)
doc = nlp(text)
# I add this line to show how to get the summary
summary = list(doc._.textrank.summary(limit_phrases=3, limit_sentences=4, preserve_order=False))
This should solve your issue.
Regarding your second try with nlp.add_pipe('textcat','textrank', last=True)
, yes it's working but it's not doing what you're expecting I guess, because in this situation you're expecting the wrong thing.
According to the spacy docs, Language.add_pipe
takes two positional arguments: factory_name
and name
. You are doing nlp.add_pipe(factory_name='textcat', name='textrank', last=True)
, which is adding a Text Classification pipeline component (with a factory named "textcat" see docs), and then you're renaming it with a name "textrank", but that component is not the component from the pytextrank
library that does the extractive summarisation you're after.
Hi @louisguitton , thanks for the quick response and explanation for the 'textcat'.
I actually did import pytextrank. As you can see from the screenshot below:
I tried it on google colab and on Amazon SageMaker instance - jupyternotebook. It still the same error I am getting.
ah I see @kaiyungtan . From your issue description, I see
spaCy version 3.0.0rc5
Platform Linux-4.14.225-121.362.amzn1.x86_64-x86_64-with-glibc2.9
Python version 3.6.13
Pipelines en_core_web_sm (3.0.0)
Can you check your pytextrank
version on that SageMaker instance like so?
In [1]: import pytextrank
In [2]: pytextrank.__version__
Out[2]: '3.1.2'
It can be that you're using without knowing an older pytextrank version (because of what pip dependencies are cached on the SageMaker environment you're using).
v3 and later versions of pytextrank
introduce breaking changes due to spacy v3 compatibility. See https://github.com/DerwenAI/pytextrank/releases
So if you run the above check and see a 2.x.x
version, please run in a cell:
!pip install -U pytextrank
Hi,I am trying out pytextrank for extractive summarization. I used the example code provided. but it didn't work.
the error come from this code:
ValueError: [E002] Can't find factory for 'textrank' for language English (en). This usually happens when spaCy calls
nlp.create_pipe
with a custom component name that's not registered on the current language class. If you're using a Transformer, make sure to install 'spacy-transformers'. If you're using a custom component, make sure you've added the decorator@Language.component
(for function components) or@Language.factory
(for class components).Available factories: attribute_ruler, tok2vec, merge_noun_chunks, merge_entities, merge_subtokens, token_splitter, parser, beam_parser, entity_linker, ner, beam_ner, entity_ruler, lemmatizer, tagger, morphologizer, senter, sentencizer, textcat, textcat_multilabel, en.lemmatizer
So, I edited the code as follow:
ValueError: Cannot get dimension 'nO' for model 'sparse_linear': value unset
I checked nlp.pipe_names:
['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer', 'textrank']
and my spacy version and details:
spaCy version 3.0.0rc5
Platform Linux-4.14.225-121.362.amzn1.x86_64-x86_64-with-glibc2.9 Python version 3.6.13
Pipelines en_core_web_sm (3.0.0)
Do you know how could I solve this issue?
Thanks