Hellisotherpeople / CX_DB8

a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)
https://huggingface.co/spaces/Hellisotherpeople/Unsupervised_Extractive_Summarization
GNU General Public License v3.0
225 stars 26 forks source link

Support *actual* textrank #10

Open Hellisotherpeople opened 5 years ago

Hellisotherpeople commented 5 years ago

I'm not actually doing the proper TextRank algorithm and I should experiment with that to see how effective it is.

Going to implement it with networkx most likely, shouldn't be difficult. Might be slow for large documents with word level models.

Hellisotherpeople commented 5 years ago

Wow! Implementing TextRank properly dramatically increased the coherency of my summaries - I guess that it makes sense that doing a walk through the word-embedding powered graph will give more coherent summaries.

Unfortunate side effect - speed of summarization takes a sizeable hit unless I can find a better implementation of PageRank.

Hellisotherpeople commented 4 years ago

I haven't actually merged that code yet to the repo - I'll do that soon so that other people can try textrank or other graph algorithms