-
While looking into some Gecko bugs related to `word-break`, I'm trying to understand exactly where the `break-all` value should be allowing soft line break opportunities, and it's not clear to me that…
-
Per [SA: Complex-Context Dependent](https://www.unicode.org/reports/tr14/#SA) in UAX14, find line breaking opportunities in South East Asian languages requires morphological analysis.
Currently, ht…
-
I change `RuleBreakPropertyTable` to a `ZeroVec` in #1652 to make it serializable. But per discussion in https://github.com/unicode-org/icu4x/issues/1638#issuecomment-1055069699, we need to think abou…
-
Per [discussion in #717](https://github.com/unicode-org/icu4x/pull/717#discussion_r634743139), we should load `UAX14_RULE_TABLE` through [`DataProvider`](https://github.com/unicode-org/icu4x/blob/main…
-
Currently, UAX 14 data is built using a Python script:
https://github.com/unicode-org/icu4x/blob/main/experimental/segmenter/tools/generate_properties.py
The Python script reads data from the fo…
-
Currently, line breaker uses `unicode_width` crate to query the char width to implement the following part of the line break strictness in the [CSS Text spec](https://drafts.csswg.org/css-text-3/#line…
-
Currently, segmenter uses [generate_properties.py](https://github.com/unicode-org/icu4x/blob/main/experimental/segmenter/tools/generate_properties.py) to generate `rule_table.rs` and `lb_define.rs`, a…
-
After discussing with ICU4X teams and experts from ICU, Markus suggested we should investigate a bit more on implementing the rule-based break iterator by using the approach in ICU4C. [Quote from his …
-
ECMAScript defines `^` and `$` to match the following characters from _LineTerminator_:
- <LF>
- <CR>
- <LS>
- <PS>
However, [UAX14](https://www.unicode.org/reports/tr14…
-
BPO | [24665](https://bugs.python.org/issue24665)
--- | :---
Nosy | @birkenfeld, @terryjreedy, @vstinner, @ezio-melotti, @bitdancer, @methane, @fgallaire, @serhiy-storchaka, @yan12125, @JulienPalard, …