DerwenAI / pytextrank

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction
https://derwen.ai/docs/ptr/
MIT License
2.13k stars 334 forks source link

Information about the matrix similarity #203

Closed EmanueleGusso closed 2 years ago

EmanueleGusso commented 2 years ago

Hi everyone, First of all I'd like to thank you for the amazing work you've done so far. I have a question regarding the extractive summarization via pytextrank. To apply the algorithm, we start from a matrix M (num_sentences x num_sentences) and we fill the matrix, often with a similarity measure between the two sentences in question. In the case of pytextrank, what is the embedding used on the sentences? I really hope you can help me. Thank you in advance for your availability!

ceteri commented 2 years ago

Thank you @EmanueleGusso - This project is about implementing the textgraph family of algorithms, primarily for entity extraction – although some variants have a "side-effect" usage in extractive summarization. That said, we weren't aiming for extractive summarization in general, or expanding on summarization.

ceteri commented 2 years ago

Also @EmanueleGusso , if it helps - here's the primary source https://derwen.ai/docs/ptr/biblio/#mihalcea04textrank for Mihalcea (2004) at EMNLP. The analysis of extractive summarization is included there.