PoetryOffice / Write

A word processor for the Haiku operating system
MIT License
3 stars 0 forks source link

Implement Text Normalization #56

Open richienyhus opened 8 years ago

richienyhus commented 8 years ago

https://en.wikipedia.org/wiki/Text_normalization

There is a low level implementation as an extention to ICU.

Canonicalization

(Convert Am, aM, am, A.M. & a.m. -> AM)

https://github.com/ianozsvald/learning_text_transformer Python & MIT

https://github.com/davidmogar/normalizr Python & MIT

Lemmatisation

(Converts: Revolutionary, Revolutions, Anti-revolution, Postrevolution, Revolutionize, Revolutionaries, Revolutionizing & Revolutionized => 'Revolution', in order to compress document index by only storing base word) https://github.com/bachan/libturglem C & BSD

https://github.com/XiaoxiaoLi/lemmatizationWithNLTK Python & TBD