Closed zykerli closed 1 year ago
Hey @dblaszcz,
Thank you for sharing the detailed description of the issue.
As you rightly figured out, applying any one for the variants "textrank", "positionrank", "biasedtextrank" attaches the extension textrank
to the doc.
That can be verified by type checking doc._.textrank
print(doc._.textrank)
# returns
# in case of textrank
# <class 'pytextrank.base.BaseTextRank'>
# in case of positionrank
# <class 'pytextrank.positionrank.PositionRank'>
# in case of biasedtextrank
# <class 'pytextrank.biasedrank.BiasedTextRank'>
Also I see in the top snippet shared by you: This statement
import pytextrank
should be placed before
spacy_nlp.add_pipe(factory_name="positionrank", name="positionrank", last=True)
I hope it helps.
I've noticed that the pipeline extensions tend to not show up in the spaCy pipeline analysis, for example when running:
print("pipeline", nlp.pipe_names)
nlp.analyze_pipes(pretty=True)
I can raise a question on the spaCy forums to find out if there are ways to register pipeline extensions.
I see the extension in the pipeline analysis using this snippet.
import spacy
import pytextrank
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("positionrank", last=True)
assert "positionrank" in nlp.pipe_names
assert "positionrank" in nlp.analyze_pipes()['summary']
Output looks like this for me
>>> nlp.analyze_pipes(pretty=True)['summary']
============================= Pipeline Overview =============================
# Component Assigns Requires Scores Retokenizes
- --------------- ------------------- -------- ---------------- -----------
0 tok2vec doc.tensor False
1 tagger token.tag tag_acc False
2 parser token.dep dep_uas False
token.head dep_las
token.is_sent_start dep_las_per_type
doc.sents sents_p
sents_r
sents_f
3 ner doc.ents ents_f False
token.ent_iob ents_p
token.ent_type ents_r
ents_per_type
4 attribute_ruler False
5 lemmatizer token.lemma lemma_acc False
6 positionrank False
✔ No problems found.
maybe it's a version issue @ceteri ? (I'm using spacy=='3.0.6' and pytextrank=='3.1.2') for this test)
Thank you @louisguitton –
Looking at this again, since pytextrank
is assigning custom attributes then these don't show up in the pipeline analysis.
Hello, I'm trying to implement your provided PositionRank and Biased TextRank algorithms for the German language with the following code.
Unfortunately, it throws some AttributeError: [E046]. It looks like the
._.positionalrank
is not implemented. The same code works fine when replacing "positionrank" with "textrank" (usingdoc._.textrank
). I'm using pytextrank version 3.1.1EDIT
As I can see from
pytextrank/pytextrank/positionrank.py
, line 23-50 (see below), PositionRank is set with set_extension, but still named as "textrank" (and not positionrank).My code at the beginning compiles when changing the last line from
summary = list(doc._.positionalrank.summary(limit_phrases=1, limit_sentences=1, preserve_order=False))
tosummary = list(doc._.textrank.summary(limit_phrases=1, limit_sentences=1, preserve_order=False))
but is really PositionRank used or TextRank? Maybe an extension of the tutorial for the algorithms beside TextRank would clarify things