Documentation or Inclusion of other algorithms

DerwenAI / pytextrank

Python implementation of TextRank algorithms ("textgraphs") for phrase extraction

https://derwen.ai/docs/ptr/

MIT License

2.15k stars 333 forks source link

Documentation or Inclusion of other algorithms #174

Closed BradKML closed 1 year ago

BradKML commented 3 years ago

The models and algorithms in https://github.com/boudinfl/pke#implemented-models are similar to Textrank but not sped up by SpaCy, so it might be a good idea to include them in PyTextRank

PS: There are also other non TextRank-esque algorithms to consider when making this assessment:

RAKE https://github.com/aneesha/RAKE and https://github.com/csurfer/rake-nltk and https://github.com/vgrabovets/multi_rake and https://github.com/chinwuDebug/RAKE_improve
YAKE https://github.com/LIAAD/yake
Aho–Corasick algorithm https://github.com/dav009/flash
RaKUn https://github.com/Parsely/serpextract

louisguitton commented 3 years ago

thanks for bringing our attention to pke !

this issue is similar to #78 for which we have made already great progress with 2 contributions:

adding PositionRank and BiasedRank
adding BaseTextRank and BaseTextRankFactory to enable integration of more flavours

Regarding the graph based models of pke, I can see this:

their TextRank can be achieved with our BaseTextRank(edge_weight=0)
their SingleRank can be achieved with our BaseTextRank() or BaseTextRank(edge_weight=1.0)
their PositionRank can be achieved with our PositionRank

the following ones are missing:

TopicRank paper by (Bougouin et al., 2013)
TopicalPageRank article by (Sterckx et al., 2015)
MultipartiteRank article by (Boudin, 2018)

I was not aware of these 3 papers and approaches so thank you. Do you have experience with them in practice and are they good? Would you be open to contribute them?

BradKML commented 3 years ago

I am mainly reporting them for notes in Documentation, but if I can I would contribute

Also some extra note: https://github.com/miso-belica/sumy/blob/master/docs/alternatives.md

BradKML commented 3 years ago

To reiterate the current algorithms that are not included:

[ ] SumBasic by Nankova et. al. and its Repository in Python
[ ] LexRank by Erkan et. al. and its Repository in Python
[ ] SalianceRank teneva et. al. by and its Reposiroty in Python
[ ] KEA by Witten et. al. and its Repository in Java
[ ] UniKeyPhrase by Wu et. al. and its Repository in Python
[ ] https://github.com/boudinfl/pke#implemented-models
- [ ] TopicRank paper by (Bougouin et al., 2013)
- [ ] TopicalPageRank article by (Sterckx et al., 2015)
- [ ] MultipartiteRank article by (Boudin, 2018)

Looking at

[ ] JAKE https://github.com/xcjackpan/jake
[ ] Crackr https://github.com/anjishnu/Crackr

ceteri commented 2 years ago

Also check the algorithms listed in pke https://github.com/boudinfl/pke which has an excellent range of implementations. FWIW, that library is GPL and not implemented as a spaCy pipeline, so there's some room for algorithm implementations both there (for research) and here (for production deployments).