Certain characters prevent later text from rendering

willplatt commented 1 year ago

Expected Behavior

Text should be rendered regardless of preceding characters in the paragraph.

Actual Behavior

Certain characters (such as த or ம from the Tamil alphabet), when followed by a large number of characters in the same paragraph with the same style classes, break the rendering of all (immediately or slightly) later characters until a change in style classes or the end of the paragraph.

For example, in the demo below, த is followed by 32,001 other characters in the same paragraph, but the paragraph only renders as த. When the number of characters following த is reduced to 32,000, then the paragraph renders correctly.

The number at which the bug occurs is complicated by the amount of whitespace in the paragraph after the த. Inserting ASCII characters at the start of the paragraph doesn't appear to change anything, nor does changing the segment's style class or splitting the text into multiple segments with the same style class.

Reproducible Demo

Run Launcher.main():

import javafx.application.Application;
import javafx.scene.Scene;
import javafx.stage.Stage;
import org.fxmisc.richtext.StyleClassedTextArea;

public class TextCutOffBug extends Application {

    public static void main(String[] args) {
        TextCutOffBug.launch(args);
    }

    @Override
    public void start(Stage stage) {
        StyleClassedTextArea textArea = new StyleClassedTextArea();
        textArea.setWrapText(true);
        String text = "த" + "a".repeat(32001) + "\nb";
        textArea.append(text, "");
        Scene scene = new Scene(textArea, 500, 300);
        stage.setScene(scene);
        stage.show();
    }

    public static class Launcher {
        public static void main(String[] args) {
            TextCutOffBug.main(args);
        }
    }
}

Environment info:

RichTextFX Version: 0.11.0 and 0.11.1
Operating System: Windows 10
Java version: 17
JavaFX version: 17.0.1

Current Workarounds

The bug only appears to occur when there are over 32,000 consecutive characters with the same style classes (and in the same paragraph). So every 32,000 characters, add/remove a meaningless style class, like so:

String text = "த" + "a".repeat(80_000) + "\nb";
textArea.append(text.substring(0, 32000), "");
textArea.append(text.substring(32000, 64000), "bug-workaround");
textArea.append(text.substring(64000), "");

Jugen commented 1 year ago

This happens with TextArea as well, so I think it's a limitation of Text nodes and probably has something to do with the number of Unicode characters allowed.

I'm guessing that with just plain characters, Text can handle 64K characters but if there's a single Unicode character then it halves to 32K.

Each style is split into its own Text node, so that's why the workaround works when you add a style at 32K.

willplatt commented 1 year ago

Ah, I didn't notice it was a bug in TextArea as well. I also tested with a Text node and the behaviour is the same.

I'm not sure what you mean by "Unicode characters", but other characters with higher code points (such as 文) don't trigger the bug. No matter how many characters are in the segment I can't get the bug to occur unless த or one of the other specific characters I know of are used. All of the above is true with Text and TextArea as well.

willplatt commented 1 year ago

I have reported the bug to Oracle's bug database now. I also discovered "த" + "文".repeat(32_001) + "b" is rendered as தb, unlike "த" + "a".repeat(32_001) + "b", which is rendered த.

Jugen commented 1 year ago

Great :-)

willplatt commented 1 year ago

The bug report has been evaluated now, so here's the link for anyone who wants to follow its progress: https://bugs.java.com/bugdatabase/view_bug?bug_id=JDK-8315653

FXMisc / RichTextFX