(Converts: Revolutionary, Revolutions, Anti-revolution, Postrevolution, Revolutionize, Revolutionaries, Revolutionizing & Revolutionized => 'Revolution', in order to compress document index by only storing base word)https://github.com/bachan/libturglem
C & BSD
https://en.wikipedia.org/wiki/Text_normalization
There is a low level implementation as an extention to ICU.
Canonicalization
(Convert Am, aM, am, A.M. & a.m. -> AM)
https://github.com/ianozsvald/learning_text_transformer Python & MIT
https://github.com/davidmogar/normalizr Python & MIT
Lemmatisation
(Converts: Revolutionary, Revolutions, Anti-revolution, Postrevolution, Revolutionize, Revolutionaries, Revolutionizing & Revolutionized => 'Revolution', in order to compress document index by only storing base word) https://github.com/bachan/libturglem C & BSD
https://github.com/XiaoxiaoLi/lemmatizationWithNLTK Python & TBD