Text does not wrap if next word begins with a period

rparrett commented 4 months ago

Testing with 74c12ade4466b08b68d7703ec9a0081da39d9eac.

Reported in https://github.com/bevyengine/bevy/issues/12098.

What I did

Layout test .test in a space where only one of those will fit on a line.

What I expected

Two lines

What I got

Output

``` "t" @ -60,12 "e" @ -54,12 "s" @ -46,12 "t" @ -39,12 " " @ -34,12 "." @ -30,12 "t" @ -26,12 "e" @ -20,12 "s" @ -12,12 "t" @ -5,12 ```

Removing the period results in:

Output

``` "t" @ 0,12 "e" @ 5,12 "s" @ 13,12 "t" @ 20,12 " " @ 26,12 "t" @ 0,28 "e" @ 5,28 "s" @ 13,28 "t" @ 20,28 ```

Repro

Expand

```rust use glyph_brush_layout::{ab_glyph::*, *}; fn main() { let dejavu = FontRef::try_from_slice(include_bytes!("../../fonts/DejaVuSans.ttf")).unwrap(); let fonts = &[dejavu]; let text = "test test"; let glyphs = Layout::default().calculate_glyphs( fonts, &SectionGeometry { screen_position: (0.0, 0.0), bounds: (50.0, 100.), }, &[SectionText { text, scale: PxScale::from(15.5), font_id: FontId(0), }], ); for glyph in glyphs { let character = &text[glyph.byte_index..glyph.byte_index + 1]; println!( "{:?} @ {},{}", character, glyph.glyph.position.x.round(), glyph.glyph.position.y.round(), ); } } ``` ## Discussion It's totally possible I'm just not understanding all the nuances of text wrapping again. I appreciate any insight you're able to provide.

alexheretic commented 4 months ago

I'm surprised by this too tbh.

The default line breaking logic is provided by xi-unicode crate and it is simply telling us that "test .test" has no line breaks in it.

// [dependencies]
// xi-unicode = "0.3"

fn main() {
    println!("==> test test");
    for line_break in xi_unicode::LineBreakIterator::new("test test") {
        println!("{line_break:?}");
    }

    println!("==> test .test");
    for line_break in xi_unicode::LineBreakIterator::new("test .test") {
        println!("{line_break:?}");
    }
}

Output:

==> test test
(5, false)
(9, false)
==> test .test
(10, false)

Perhaps you should raise this issue upstream.

rparrett commented 4 months ago

Thanks for the pointer. I'll look into whether or not this is a bug on their end or an expected behavior according to https://unicode.org/reports/tr14/ which they are following.

rparrett commented 4 months ago

I think it's likely that this is expected under UAX14. It would seem to be covered by LB13, with . being classified as an infix separator (IS) and preventing breaks before it, even if they are spaces. Though the standard is not super readable so I could be mistaken.

Out of curiosity I checked unicode-linebreak (used by cosmic-text and lapce), and it behaves the same way, so maybe I am not mistaken.

// [dependencies]
// unicode-linebreak = "0.1.5"

fn main() {
    println!("==> test test");
    for line_break in unicode_linebreak::linebreaks("test test") {
        println!("{line_break:?}");
    }

    println!("==> test .test");
    for line_break in unicode_linebreak::linebreaks("test .test") {
        println!("{line_break:?}");
    }
}

==> test test
(5, Allowed)
(9, Mandatory)
==> test .test
(10, Mandatory)

I also checked icu::segmenter (even with LineBreakStrictness::Loose) and https://github.com/foliojs/linebreak just in case.

Some relevant/helpful text from a Go implementation:

The goal of matching user perceptions cannot always be met exactly because the text alone does not always contain enough information to unambiguously decide boundaries. For example, the period (U+002E FULL STOP) is used ambiguously, sometimes for end‐of‐sentence purposes, sometimes for abbreviations, and sometimes for numbers.

Every browser I can test doesn't seem to exhibit this behavior though, but I believe browsers are also doing some fancy stuff like switching break mode based on container width.

Thanks for the help.

alexheretic / glyph-brush