Character encodings - Githubissues

This library currently handles conversion between strings and sets of codepoints, in an attempt to provide an intuitive and easy to use API. It may be a better idea to require the codepoint conversion to take place before input to this library.

This would allow for systems using a third-party UTF-8 implementation such as utf8. It also neatly avoids the issue of what encodings PRECIS deems valid. For example, from RFC 7613:

An entity that prepares a string according to this profile MUST first map fullwidth and halfwidth characters to their decomposition mappings (see Unicode Standard Annex #11 [UAX11]). This is necessary because the PRECIS "HasCompat" category specified in Section 9.17 of [RFC7564] would otherwise forbid fullwidth and halfwidth characters. After applying this width-mapping rule, the entity then MUST ensure that the string consists only of Unicode code points that conform to the PRECIS IdentifierClass defined in Section 4.2 of [RFC7564]. In addition, the entity then MUST encode the string as UTF-8 [RFC3629].

(emphasis mine)

See discussion under #1 for more information.

eloquent / precis-js

Character encodings #7