spec.json combining marks

djstrong commented 5 months ago

I have checked some data from spec.json. There are combining marks under cm key and "valid" characters from groups (primary and secondary). Some combining marks are in these valid characters some are not (948). Why is that? One example is character 20E3 (present in cm, absent in any group, it is part of an emoji). Here https://adraffy.github.io/ens-normalize.js/test/chars.html it is marked as disallowed. Other example is 20DD - present in cm, absent in any group, but it is not part of any emoji.

adraffy commented 5 months ago

The extra data is so is_combining_mark() can be correct independent of ENSIP-15. For example, it useful to know if a character is a CM inside of an error message for a disallowed character.

adraffy commented 5 months ago

There's a similar situation in the IDNA mapping data:

{130} maps to {69}{307} which are individually valid but can't be used together (Latin has per-sequence combining mark rules). Could be removed, but isn't
{FF9E} and {3099} are invalid but {304B}{FF9E} is mapped [FF9E → 3099] then NFC {304B}{3099} → {304C}. Appears removable, but isn't

adraffy / ens-normalize.js

spec.json combining marks #27