DavidBelicza / TextRank

:wink: :cyclone: :strawberry: TextRank implementation in Golang with extendable features (summarization, phrase extraction) and multithreading (goroutine).
MIT License
205 stars 22 forks source link

More accurate ranking algorithm #4

Closed DavidBelicza closed 6 years ago

DavidBelicza commented 6 years ago

In case 1, the icons - tray and extension - gnome phrases got 0.5 weight, but it's clearly noticeable that extension - gnome is a more important phrase than icons - tray. The two phrase's occurrence is equal but the gnome word itself has more hit than icon or tray words. Follow this logic, the extension - gnome weight should be > 0.5 and < 1.

But this logic should not make that side effect what happens in case 2, all phrases become important what contains the word gnome.

Case 1 and case 2 are correct, so they shouldn't be modified but a new algorithm is required what implement the above logic. It should be a new, third Algorithm interface implementation: SupervisedAlgorithm or ComparatorAlgorithm.

Case 1, FindPhrases method result from ranked text by AlgorithmDefault

Case 2, FindPhrases method result from ranked text by AlgorithmMixed

DavidBelicza commented 6 years ago

Done in PR #5 New algorithm name is ChainAlgorithm and MixedAlgorithm has been removed.