Implement Text Normalization

https://en.wikipedia.org/wiki/Text_normalization

There is a low level implementation as an extention to ICU.

Canonicalization

(Convert Am, aM, am, A.M. & a.m. -> AM)

https://github.com/ianozsvald/learning_text_transformer Python & MIT

https://github.com/davidmogar/normalizr Python & MIT

(Converts: Revolutionary, Revolutions, Anti-revolution, Postrevolution, Revolutionize, Revolutionaries, Revolutionizing & Revolutionized => 'Revolution', in order to compress document index by only storing base word) https://github.com/bachan/libturglem C & BSD

https://github.com/XiaoxiaoLi/lemmatizationWithNLTK Python & TBD

PoetryOffice / Write

Implement Text Normalization #56

Canonicalization

Lemmatisation