DerwenAI / pytextrank

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
https://derwen.ai/docs/ptr/
MIT License
2.15k stars 333 forks source link

Biased Textrank implementation uses phrases instead of sentences #202

Closed ahmed-moubtahij closed 2 years ago

ahmed-moubtahij commented 2 years ago

Is there a reason why phrases are used instead of sentences in biased textrank's implementation? The original paper uses sentences, and encodes them using Sentence-BERT. A popular use for biased textrank is extractive summarization (as presented in the paper) and sentences make more sense for that task.

I can see that there is a Sentence class in https://github.com/DerwenAI/pytextrank/blob/main/pytextrank/base.py . Is there a customization point that I'm missing, or would I have to attempt adapting the biasedrank.py source-code for the use of sentences by importing that Sentence class instead of Phrase here? https://github.com/DerwenAI/pytextrank/blob/985b8a5c3e4a384850b8d7326a44ca7205c973c3/pytextrank/biasedrank.py#L13

Ankush-Chander commented 2 years ago

Hi @Ayenem,

Biased Text implementation we currently have is more closer to "Personalized PageRank" than to the implementation to original paper.

In the original paper sentences are being taken as nodes and their restart probabilities are being manipulated based on their similarity with the "focus" vector.

Currently we don"t rely on embedding. The restart probabilities of the terms occurring in the focus are being set as per bias.

Here"s a thread where we enumerate and discuss in detail the textrank variants(implemented so far and in future) and motivations behind them.

Is there a reason why phrases are used instead of sentences in biased textrank's implementation?

Using phrases made sense for my use-case at that moment which was to find core concepts(keyphrases) of a research paper. By using title/abstract concepts as focus we were able to uplift concepts that co-occurred with the abstract concepts.

ahmed-moubtahij commented 2 years ago

I see, thank you for explaining. It seems that the original implementation (which used sentences) is no longer available on the author's repo. Maybe because they assumed this one did the same but better, but I'm just speculating. At any rate, are there plans for implementing the sentence-based version?

ceteri commented 2 years ago

Sure, do you want to check with @ashkankzme ? He can probably answer that much more directly :)

ahmed-moubtahij commented 2 years ago

Absolutely, thank you!