MicrosoftDocs / typography-issues

Creative Commons Attribution 4.0 International
45 stars 21 forks source link

Need documentation on how text is assigned to shaping engines and segmented #265

Open NorbertLindenberg opened 4 years ago

NorbertLindenberg commented 4 years ago

This is an issue that affects work with all shaping engines, but even more so the USE: There doesn't seem to be any specification for how text is assigned to shaping engines and broken into runs and clusters.

Let's assume an OpenType renderer gets asked to render some text using the Javanese, Sundanese, Latin, and Arabic scripts, along with ASCII digits, spaces, and punctuation, dotted circles and no-break spaces, ZWS, ZWJ and ZWNJ. The font used supports all the characters used in the text (to avoid the even harder issue of font fallbacks, which is probably outside the scope of the OpenType specification).

How should the renderer go about breaking the text into runs for the different shaping engines, and the runs into clusters? Which Unicode properties are taken into consideration? Does language information play any role? Where do the characters go that aren't specific to any script or language? Can clusters contain characters from different scripts, e.g., can Javanese marks be attached to Sundanese bases?

If there is a specification covering this, please link to it from all shaping engine descriptions. If not, it needs to be created.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

NorbertLindenberg commented 4 years ago

A related investigation is here: https://github.com/OpenType/opentype-layout/blob/master/docs/script_segmentation.md