Open djstrong opened 5 months ago
The extra data is so is_combining_mark()
can be correct independent of ENSIP-15. For example, it useful to know if a character is a CM inside of an error message for a disallowed character.
There's a similar situation in the IDNA mapping data:
{130} maps to {69}{307} which are individually valid but can't be used together (Latin has per-sequence combining mark rules). Could be removed, but isn't
{FF9E} and {3099} are invalid but {304B}{FF9E} is mapped [FF9E → 3099] then NFC {304B}{3099} → {304C}. Appears removable, but isn't
I have checked some data from
spec.json
. There are combining marks undercm
key and "valid" characters fromgroups
(primary and secondary). Some combining marks are in these valid characters some are not (948). Why is that? One example is character20E3
(present incm
, absent in any group, it is part of an emoji). Here https://adraffy.github.io/ens-normalize.js/test/chars.html it is marked as disallowed. Other example is20DD
- present incm
, absent in any group, but it is not part of any emoji.