avian2 / unidecode

ASCII transliterations of Unicode text - GitHub mirror
https://pypi.python.org/pypi/Unidecode
GNU General Public License v2.0
516 stars 62 forks source link

Option to avoid transliterating punctuation marks as regular letters #66

Open eyaler opened 3 years ago

eyaler commented 3 years ago

Always transliterating punctuation marks as regular letters could be an issue for some applications. While the paragraph sign ¶ is transliterated to P, I would like to have an option to treat it as unknown.

(I started this issue following https://github.com/avian2/unidecode/commit/81f938d9419f4b651a089a0d809bd1a0566b1329 and regarding the exotic inverted nun ׆‎ that was changed to be transliterated into n as the regular nun נ. but the inverted one is an editorial/punctuation mark)

avian2 commented 3 years ago

Thank you for the suggestion, but I am not going to implement this in Unidecode. I want to keep Unidecode a simple function with no configuration. The reason is similar to why I don't want to have language configuration in this library. I don't have time or knowledge to maintain the additional complexity. There are other transliteration libraries (unihandecode, for example) that are more configurable and might accept of your proposal.