Computing multi-line and formatted text layout for non-DOM scenarios

sushraja-msft commented 3 years ago

Hello WICG,

Microsoft has put together an explainer for a method to leverage the UA's ability to compute line-breaking of formatted text runs in scenarios where the DOM is not directly usable or available. The API takes advantage of the UA's layout engine to address many of the subtle complexities of text layout that make implementing line-breaking of formatted text a complex task in JavaScript. For example, properly handling international text, bidi, text shaping, etc. (see the explainer for more detail).

We would like to have the WICG community join us in reviewing this proposal, and would like to move it soon into an incubation as it is generating interest from web developers and some partners that originally suggested the idea.

The proposal currently is targeted for Canvas text layout scenarios, but we anticipate generalizing some of the concepts to harmonize and be potentially shared with the Houdini Layout API and potentially other platform areas in which this capability could be useful in the future.

nhelfman commented 3 years ago

Microsoft Office Online finds this API proposal extremely helpful and beneficial for many of our use cases. There is a very high complexity for implementing complex text rendering features such as line breaking, bidi text, rich text formatting on top of current Canvas2D API. This is especially true for complex scripts which do not have simple line breaking rules (e.g. Thai language). Additional support in that context for special typographic features which require text shaping engine is also a high implementation cost when done as part of the JS rendering engine (e.g. Glyph substitution, Combined Glyph Clustering). In addition, there is an opportunity for rendering performance gains when user agent vendors implement these capabilities at the native level.

We would like to see this proposal matures and progresses to a full supported standard as fast as possible.

travisleithead commented 3 years ago

Also, worth pointing out some of the support for this idea over on this other issue: https://github.com/w3c/css-houdini-drafts/issues/990

bonmotbot commented 3 years ago

GSuite is also interested in this proposal. We think there could be benefits here in terms of both performance and capabilities for web apps. Here are a few considerations we'd like to see addressed as part of this.

Performance

Currently, using measureText and fillText together results in duplicate work, since both methods take a string and shape it into glyphs. A promising feature of this proposal is that it introduces an intermediate object to capture the text shaping work. measureFormattedText creates a CanvasFormattedTextLine which can then be used for both measuring and drawing. As a result, fillFormattedTextLine should be faster than fillText, and caching CanvasFormattedTextLines can be a way to speed up drawing.

Given those potential benefits of the CanvasFormattedTextLine object, it would be nice for this API to efficiently handle simple cases as well as multi-line and multi-format. As an option, maybe measureFormattedText could also take a string and return a line based on the current canvas context:

let line = ctx.measureFormattedText('hello world');
ctx.fillFormattedTextLine(currentLineObject, x, y);

Regardless of specific API, some kind of drop-in replacement for measureText, that has similar latency but gives back a CanvasFormattedTextLine would be very useful.

Cursor Positioning

For editing use cases, it's necessary to know the x position of a cursor within a text line. This is particularly useful for cases with ligatures and kerning, where sizing individual characters isn't sufficient. Here's a sketch of what this might look like for rendering a collaborator cursor in Google Docs:

ctx.font = '30px Raleway';
let line = ctx.measureFormattedText('efficiently');

// draw text
ctx.fillFormattedTextLine(line, x, y);

// draw collaborator cursor
ctx.fillStyle = 'limegreen'; // collaborator color
let caretX = x + line.getCaretX(/* charIndex */ 3);
let caretY = y - 30;
ctx.fillRect(caretX, caretY, 1, 30);
ctx.fillRect(caretX - 2, caretY - 2, 5, 5);

Importantly, the returned x position takes the ffi glyph into account and does the appropriate thing. This isn't something web developers can calculate with measureText alone, but it is a calculation that the browser already does today in other contexts.

Workers / OffscreenCanvas

Hopefully, these APIs would also apply to OffscreenCanvas and be usable from Workers. It would also be useful to pass CanvasFormattedTextLine between Workers. This would allow sizing on one Worker, but rendering on another, for example.

nhelfman commented 3 years ago

As an option, maybe measureFormattedText could also take a string and return a line based on the current canvas context:
let line = ctx.measureFormattedText('hello world');
ctx.fillFormattedTextLine(currentLineObject, x, y);
Regardless of specific API, some kind of drop-in replacement for measureText, that has similar latency but gives back a CanvasFormattedTextLine would be very useful.

Agree with this idea. It could simplify adoption of the new API.

Cursor Positioning

For editing use cases, it's necessary to know the x position of a cursor within a text line. This is particularly useful for cases with ligatures and kerning, where sizing individual characters isn't sufficient. Here's a sketch of what this might look like for rendering a collaborator cursor in Google Docs:
...
// draw collaborator cursor
ctx.fillStyle = 'limegreen'; // collaborator color
let caretX = x + line.getCaretX(/* charIndex */ 3);
...
Importantly, the returned x position takes the ffi glyph into account and does the appropriate thing. This isn't something web developers can calculate with measureText alone, but it is a calculation that the browser already does today in other contexts.

In addition to ffi glyphs (ligatures) caret positioning, grapheme clusters should be taken into account. Would defining the charIndex as something more generic of graphemeIndex is more appropriate in this case? Character index in a string can be inaccurate in some cases. For example, the following string "👨‍👨‍👧‍👧" has length of 11 characters. However it has only a single grapheme and I would expect caret position possible values to be either 0 (before the emoji) or 1 (after the emoji).

bonmotbot commented 3 years ago

I think indexing by char is right because the string is the underlying representation, and the cursor is typically going to be positioned based on that. For example, if you have the string let text = 'happy 👨‍👨‍👧‍👧 family' and you want to get the cursor position before the m, a natural way to do that would be line.getCaretX(text.indexOf('m')). If the indexing was grapheme based, you'd have to map the string index to grapheme index somehow.

It's true that not every char index will have an appropriate caret coordinate, but I think that's ok. One way to go is to have every char in a cluster return the same x position. Another is to return some sentinel value (null, NaN) for all but the first char in a cluster.

I would compare this to the Range.setStart / Range.setEnd API, where indices are char based, and you can create a range that splits a grapheme boundary if you want, but selections snap to boundaries when rendered.

travisleithead commented 3 years ago

Love this discussion and show of support. As a next step, let's get this work started as a new incubation here in WICG to continue to advance the proposal.

New Repo created: https://github.com/WICG/canvas-formatted-text

WICG / proposals