boudinfl / pke

Python Keyphrase Extraction module
GNU General Public License v3.0
1.57k stars 291 forks source link

Throws zero division error #207

Closed shyambhu-mukherjee closed 2 years ago

shyambhu-mukherjee commented 2 years ago

https://github.com/boudinfl/pke/blob/5af1f817e0211c33ac3f90e1e86bb5c1283448e8/pke/unsupervised/graph_based/topicrank.py#L189 This line throws zerodivision error when we try to use ngram_selection=5.

ygorg commented 2 years ago

Please share a minimal reproducible example to make debugging easier. Related PR #177 and issues #142, #77, #27

shyambhu-mukherjee commented 2 years ago

Example code:

    extractor = pke.unsupervised.TopicRank()

    extractor.load_document(input = total_text,
                            language = 'en')  

    #extractor.candidate_selection()  

    extractor.ngram_selection(n=3)  

    #this default weight was 0.74 in original algorithm  

    extractor.candidate_weighting(threshold = 0.2)  

    keyphrases_response = extractor.get_n_best(n = 30,
                                               redundancy_removal = True)  

When we remove the candidate weighting line, i.e. let it choose default 0.74 value, then it starts generating empty phrase list. Hope this helps.

ygorg commented 2 years ago

Please try to see whether any candidates were selected using len(extractor.candidates) before extractor.candidate_weighting. Does the code you provided throws ZerodivisionError ? I can't reproduce the behaviour as I don't know what total_text contains. About Minimal Reproducible Examples