UAX44-LM2 exceptionally treats the hyphen in “HANGUL JUNGSEONG O-E” as non-medial. character_name_normalize detects that by checking whether the remaining string after the hyphen is E or e. It should also ignore spaces and underscores. For example, "hangul jungseong o-e _" should be canonicalized to "hanguljungseongo-e", not "hanguljungseongoe".
UAX44-LM2 exceptionally treats the hyphen in “HANGUL JUNGSEONG O-E” as non-medial.
character_name_normalize
detects that by checking whether the remaining string after the hyphen isE
ore
. It should also ignore spaces and underscores. For example,"hangul jungseong o-e _"
should be canonicalized to"hanguljungseongo-e"
, not"hanguljungseongoe"
.https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0ac3aa310000440b22a7dfe05b56df8d