ff-notes / ron

Haskell implementation of RON and RON-RDT
BSD 3-Clause "New" or "Revised" License
65 stars 9 forks source link

RGA.diff: Ivestigate what bad can happen when splitting multi-codepoint "characters" into codepoints #146

Open cblp opened 4 years ago

cblp commented 4 years ago

What bad can happen if we split é into e + ´?

See also 2-codepoint country flags.

For this, Unicode has a concept of “grapheme cluster”. There’s also “extended grapheme cluster” (EGC), which is basically an updated version of the concept.

http://unicode.org/glossary/#grapheme_cluster

http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries