WICG / canvas-formatted-text

Other
80 stars 17 forks source link

Language and direction metadata #36

Open xfq opened 2 years ago

xfq commented 2 years ago

I just read the three documents of this incubation experiment. Leveraging the power of the CSS layout engine sounds like a useful way to style text in Canvas.

I wonder if there is a way to associate language and direction metadata with FormattedText and/or FormattedTextRun? See string-meta for more information.

travisleithead commented 2 years ago

PR https://github.com/WICG/canvas-formatted-text/pull/39 includes lang metadata as recommended for string annotations in the link you provided. For 'dir', I'm confident that relying on CSS direction will fill that need without a separate value for 'dir'.

travisleithead commented 2 years ago

39 has landed. I think we are covered from the dir/lang side of things.

aphillips commented 2 years ago

In reviewing the above thread, I find the text here, which says:

No additional work is needed from web developers to support bidi text. At format time, bidi analysis is done on the input text which creates internal bidi runs if necessary.

This is not correct: the bidi algorithm needs help from content authors in order to produce the correct results.

I do agree that CSS direction can be used as the attribute for associating the direction with a FormattedText paragraph and as the bearer of directional metadata within a paragraph. However, the document is bereft of examples and the statement about bidi is misleading. Every paragraph has a "base paragraph direction" necessary for computing directional runs and this should be called out. Bidi analysis proceeds from this base direction. Detection via "first strong" is often wrong.

In addition, runs of text within a paragraph often need to be "spanned" with a direction in order to get the right results. This doesn't appear to be accounted for in FormattedText. We have some examples here.

I'll also paste a couple of screenshots to exemplify the need for direction in-line. These are using HTML to mark up the text, but you can imagine how FormattedText would need similar "spanning" within the text.

Here's the "badly styled" paragraph (no added direction):

image

Here's the fixed version:

image

xfq commented 2 years ago

In addition to Addison's comments above, the W3C Internationalization WG found some of the terminologies in the documents here to be inaccurate. What would be the best way to engage you? Would you prefer PRs? Or issues?

travisleithead commented 2 years ago

@aphillips thanks so much for the review and your comments. I'm looking forward to making these explainers so much better as a result. Sounds like an example would be good to add to reflect the importance of needing to help the Bidi algorithm as needed, and emphasizing the use of CSS direction (and maybe unicode-bidi properties?) as important components of that.

I think the spanning you describe is fully possible with this proposal--i.e., I should be able to translate your above example into the Formatted Text input in roughly the same way (and it should produce the same result, given it's ultimately processed by the same layout/rendering pipeline).

I would like to know more about how the "base paragraph direction" is established. In HTML Canvas, for example, when a JavaScript string is rendered to the canvas with fillText() how is this base paragraph direction chosen? Is it inherited into the Canvas from elsewhere in the DOM? For the HTML parser, how does it establish it? Does it ultimately derive from language or network hints. I'm very curious. This seems related to #49 as another default we need to think about.

@xfq I would welcome any help you can offer on improving terminologies. PRs will be the fastest ways to suggest the improvements. Looking forward to any help you can provide.

fantasai commented 8 months ago

For the HTML parser, how does it establish it?

See

The directionality of an element, as established by HTML, gets mapped to the direction property in CSS, which, when set on a block container, sets the base direction for the inline formatting context it contains.

Wrt specifying direction, btw, I think it would be better to have a dir property analogous to HTML's dir attribute, and parallel to the lang property, directly on the text object, rather than relying on direction and unicode-bidi in the styles. There are several reasons for this: