This is an issue that affects work with all shaping engines, but even more so the USE: There doesn't seem to be any specification for how text is assigned to shaping engines and broken into runs and clusters.
Let's assume an OpenType renderer gets asked to render some text using the Javanese, Sundanese, Latin, and Arabic scripts, along with ASCII digits, spaces, and punctuation, dotted circles and no-break spaces, ZWS, ZWJ and ZWNJ. The font used supports all the characters used in the text (to avoid the even harder issue of font fallbacks, which is probably outside the scope of the OpenType specification).
How should the renderer go about breaking the text into runs for the different shaping engines, and the runs into clusters? Which Unicode properties are taken into consideration? Does language information play any role? Where do the characters go that aren't specific to any script or language? Can clusters contain characters from different scripts, e.g., can Javanese marks be attached to Sundanese bases?
If there is a specification covering this, please link to it from all shaping engine descriptions. If not, it needs to be created.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.
ID: 194a6d3c-4137-46e9-3a4b-44b990200986
Version Independent ID: a0c8e788-5228-aa28-670e-3ba1ac3faecd
This is an issue that affects work with all shaping engines, but even more so the USE: There doesn't seem to be any specification for how text is assigned to shaping engines and broken into runs and clusters.
Let's assume an OpenType renderer gets asked to render some text using the Javanese, Sundanese, Latin, and Arabic scripts, along with ASCII digits, spaces, and punctuation, dotted circles and no-break spaces, ZWS, ZWJ and ZWNJ. The font used supports all the characters used in the text (to avoid the even harder issue of font fallbacks, which is probably outside the scope of the OpenType specification).
How should the renderer go about breaking the text into runs for the different shaping engines, and the runs into clusters? Which Unicode properties are taken into consideration? Does language information play any role? Where do the characters go that aren't specific to any script or language? Can clusters contain characters from different scripts, e.g., can Javanese marks be attached to Sundanese bases?
If there is a specification covering this, please link to it from all shaping engine descriptions. If not, it needs to be created.
Document Details
⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.