edicl / cl-unicode

Portable Unicode library for Common Lisp
https://edicl.github.io/cl-unicode/
61 stars 24 forks source link

Normalize function #12

Open libre-man opened 7 years ago

libre-man commented 7 years ago

For the data processing I do time to time it is quite often really useful to have a 'normalize' function. Such a function converts unicode character to their ASCII 'equivalent' character or string. This is useful as a lot of older systems do not support unicode and to match strings I have to convert the unicode ones to the older ASCII equivalent. The function I use to do this is a simple lookup in a large lookup table, like this:

(defun normalize (string)
           (check-type string string)
           (format nil "~{~A~}"
                   (loop :for el :across string
                         :collect (aref +unicode-lookup-table+
                                        (char-code el)))))

I think such a function could be really useful in a unicode library. Is this something that fits this library and how would you feel about a pull request to add this functionality?

hanshuebner commented 7 years ago

Support for transliteration in cl-unicode would be useful, but it should be extensible to support new or user-defined transliteration schemes. See http://cldr.unicode.org/index/cldr-spec/transliteration-guidelines and http://www.unicode.org/cldr/charts/latest/transforms/index.html for some schemes known by the Unicode consortium.

At the very least, the function provided by cl-unicode should accept an argument to indicate what scheme to use. normalize is not a good function name, transliterate seems to be better.