-
-
(Not marked as an API proposal because I think it definitely needs some more designing before putting one up)
When handling Unicode strings, you quickly realise that strings can be split into many …
-
[https://w3c.github.io/iip/gap-analysis/deva-gap.html#linebreak](https://w3c.github.io/iip/gap-analysis/deva-gap.html#linebreak)
Is line breaking in Devanagari driven by breaking at a word boundary…
-
Example code:
```
i = '👨👨👦👦c'.indexChar('c')
print 'Found char at {i}.' -- Found char at 7.
```
Indexing strings based on codepoint doesn't really make sense with regards to how unicode is act…
-
WWW
As a researcher wanting to understand the impact of phonetic error correction using language models on word-level recognition from dysarthric speakers I would like to run an experiment using acous…
-
https://dom.spec.whatwg.org/#ranges
For Text nodes, it seems that the offset of a boundary point is code unit (rather than [grapheme cluster](https://unicode.org/reports/tr29/#Grapheme_Cluster_Boun…
-
Several sections assume that width of a grapheme cluster can be modified in magic ways:
> 6.1: variation selector 16 (VS16) that may have caused the width of the grapheme cluster to change to wide …
-
This is a follow-up on #11406 which introduced escaping for all non-printable characters in `String#inspect`.
While that change is an improvement, it has a negative effect on grapheme clusters comp…
-
The Grapheme API for strings was introduced in #11472.
However, there is still a debate about the actual data format for graphemes. They are a sequence of codepoints, which is typically represente…
-
Given a string consisting of an invalid byte, followed by a character that forms a Grapheme cluster with the Unicode replacement character, such as `"\xFF\u200D"`: Does it comprise a single grapheme (…