BurntSushi / ucd-generate

A command line tool to generate Unicode tables as source code.
Apache License 2.0
95 stars 21 forks source link

Spaces and underscores are not completely ignored for U+1180 HANGUL JUNGSEONG O-E #12

Closed dscorbett closed 5 years ago

dscorbett commented 5 years ago

UAX44-LM2 exceptionally treats the hyphen in “HANGUL JUNGSEONG O-E” as non-medial. character_name_normalize detects that by checking whether the remaining string after the hyphen is E or e. It should also ignore spaces and underscores. For example, "hangul jungseong o-e _" should be canonicalized to "hanguljungseongo-e", not "hanguljungseongoe".

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0ac3aa310000440b22a7dfe05b56df8d