Closed ahmed-moubtahij closed 2 years ago
Hi @Ayenem,
Biased Text implementation we currently have is more closer to "Personalized PageRank" than to the implementation to original paper.
In the original paper sentences are being taken as nodes and their restart probabilities are being manipulated based on their similarity with the "focus" vector.
Currently we don"t rely on embedding. The restart probabilities of the terms occurring in the focus
are being set as per bias
.
Here"s a thread where we enumerate and discuss in detail the textrank variants(implemented so far and in future) and motivations behind them.
Is there a reason why phrases are used instead of sentences in biased textrank's implementation?
Using phrases made sense for my use-case at that moment which was to find core concepts(keyphrases) of a research paper. By using title/abstract concepts as focus we were able to uplift concepts that co-occurred with the abstract concepts.
I see, thank you for explaining. It seems that the original implementation (which used sentences) is no longer available on the author's repo. Maybe because they assumed this one did the same but better, but I'm just speculating. At any rate, are there plans for implementing the sentence-based version?
Sure, do you want to check with @ashkankzme ? He can probably answer that much more directly :)
Absolutely, thank you!
Is there a reason why phrases are used instead of sentences in biased textrank's implementation? The original paper uses sentences, and encodes them using Sentence-BERT. A popular use for biased textrank is extractive summarization (as presented in the paper) and sentences make more sense for that task.
I can see that there is a
Sentence
class in https://github.com/DerwenAI/pytextrank/blob/main/pytextrank/base.py . Is there a customization point that I'm missing, or would I have to attempt adapting thebiasedrank.py
source-code for the use of sentences by importing thatSentence
class instead ofPhrase
here? https://github.com/DerwenAI/pytextrank/blob/985b8a5c3e4a384850b8d7326a44ca7205c973c3/pytextrank/biasedrank.py#L13