boudinfl / pke

Python Keyphrase Extraction module
GNU General Public License v3.0
1.55k stars 291 forks source link

PositionRank algorithm fails with networkx version 3.0: IndexError in `candidate_weighting()` method #221

Closed tagucci closed 1 year ago

tagucci commented 1 year ago

I encountered an error when I used the PositionRank algorithm. While IndexError as below occured when calling extractor.candidate_weighting(), it worked correctly networkx==2.8.8.

IndexError                                Traceback (most recent call last)
Cell In[6], line 15
     12 extractor.candidate_selection()
     14 # candidate weighting, in the case of TopicRank: using a random walk algorithm
---> 15 extractor.candidate_weighting()
     17 # N-best selection, keyphrases contains the 10 highest scored candidates as
     18 # (keyphrase, score) tuples
     19 keyphrases = extractor.get_n_best(n=10)

File /home/pke/pke/unsupervised/graph_based/positionrank.py:171, in PositionRank.candidate_weighting(self, window, pos, normalized)
    168     self.positions[word] /= norm
    170 # compute the word scores using biased random walk
--> 171 w = nx.pagerank(G=self.graph,
    172                 alpha=0.85,
    173                 tol=0.0001,
    174                 personalization=self.positions,
    175                 weight='weight')
    177 # loop through the candidates
    178 for k in self.candidates.keys():

File /usr/local/lib/python3.10/dist-packages/networkx/classes/backends.py:134, in _dispatch.<locals>.wrapper(*args, **kwds)
    132 @functools.wraps(func)
    133 def wrapper(*args, **kwds):
--> 134     graph = args[0]
    135     if hasattr(graph, "__networkx_plugin__") and plugins:
    136         plugin_name = graph.__networkx_plugin__

IndexError: tuple index out of range

I've used minimal example in README switching exctractor to PositionRank.

import pke

extractor = pke.unsupervised.PositionRank()
extractor.load_document(input='text', language='en')
extractor.candidate_selection()
extractor.candidate_weighting()
keyphrases = extractor.get_n_best(n=10)

I suspect that the issue is related to https://github.com/networkx/networkx/issues/6458. Set networkx==2.8.8 in requirements.txt or fix positionrank.py will solve this problem.

ygorg commented 1 year ago

Fixed by #222