ahmetaa / zemberek-nlp

NLP tools for Turkish.
Other
1.14k stars 207 forks source link

De-tokenization support #181

Open ahmetaa opened 5 years ago

ahmetaa commented 5 years ago

When a sentence is topkenized as a String list or converted to a String, it is not possible to return to the original for of the String. For some cases this may necessary. So we need at least limited support for this functionality.

Such as: "Merhaba, nasılsın?" -> Tokenize -> "Merhaba , nasılsın ?" -> Detokenize -> "Merhaba, nasılsın?"