MicrosoftDocs / typography-issues

Creative Commons Attribution 4.0 International
47 stars 21 forks source link

OpenType needs metrics for Hebrew, Thai, etc. #240

Open fantasai opened 5 years ago

fantasai commented 5 years ago

https://drafts.csswg.org/css-inline-3/#aligning-initial-letter

When laying out initial drop letters, we need to align the top edges of Hebrew text. This edge corresponds neither to the cap height nor the ex-height, nor the ideographic ink face baselines, so it needs its own metric.

(Also, likewise for every other script whose top edge does not coincide with an existing metric.)

behdad commented 5 years ago

But then, one needs these for every script... For example, Indic scripts also need the hanging baseline. I don't see how adding this for Hebrew is a good idea.

dauwhe commented 5 years ago

Would the same baseline work for Indic and Hebrew? We've struggled to get answers from the community.

Our larger point is that there is much we can't do on the web today, because even if some of these baselines exist in theory, we don't find them in actual fonts.

behdad commented 5 years ago

Our larger point is that there is much we can't do on the web today, because even if some of these baselines exist in theory, we don't find them in actual fonts.

I think we should get HarfBuzz to expose BASE-table baselines. Then hook it up to browsers, then tell font designers to populate their fonts with baselines...

behdad commented 5 years ago

cc @jfkthame @drott @ebraminio.

fantasai commented 5 years ago

@behdad I think that's a good way to do this. But aren't the metrics in the base tables pre-defined? There isn't one for Hebrew, and while it might sync up with Thai, which has a similar pattern of ascenders, I doubt it will sync up with Devanagari, whose top corresponds more closely to the Latin cap height than the ex height. Neither Hebrew nor Thai have baseline metrics in OpenType afaik.

behdad commented 5 years ago

@behdad I think that's a good way to do this. But aren't the metrics in the base tables pre-defined? There isn't one for Hebrew, and while it might sync up with Thai, which has a similar pattern of ascenders, I doubt it will sync up with Devanagari, whose top corresponds more closely to the Latin cap height than the ex height. Neither Hebrew nor Thai have baseline metrics in OpenType afaik.

There are idea regarding how to use this that are not in the spec (yet). Apple was particularly interested implementing...

Basically, OpenType fonts expose certain metrics (ascent, descent, xheight, capheight, etc). These values can vary, in a variable font, by way of the MVAR table. To implement that, MVAR assigns a four-byte tag to each of those metrics. There is consensus that if script-/language-specific variations to those values is needed, we should encode them in the BASE table, which already has facilities for script/language data, using the same four-byte tags from MVAR, which BASE table already allows.

So, basically, automatically extending all MVAR tags to also be BASE tags.

behdad commented 5 years ago

Then you can simply query script-/language-specific value for "cap height".

fantasai commented 5 years ago

@behdad I'm not sure I understand. Let's suppose I have a font that supports Latin, CJK, Hebrew, Devenagari, and Thai. Are you saying that the "ex height" metric will not correspond to the Latin ex height if I am typesetting Japanese or Hebrew text, even if it happens to contain Latin text? What does an ex-height even mean for these writing systems?

behdad commented 5 years ago

@behdad I'm not sure I understand. Let's suppose I have a font that supports Latin, CJK, Hebrew, Devenagari, and Thai. Are you saying that the "ex height" metric will not correspond to the Latin ex height if I am typesetting Japanese or Hebrew text, even if it happens to contain Latin text? What does an ex-height even mean for these writing systems?

I'm saying that we have a proposed way to have language-/script-specific value for any metric in the font. Both cap height and x-height are meaningless in many scripts, and yet referenced by CSS. We might as well just let the font modify it to affect where it is used in CSS. That's what I'm suggesting.

fantasai commented 5 years ago

@behdad That doesn't answer my question.

When we tell the font we're typesetting Hebrew, are you suggesting that it should tell us that the ex height = cap height = height of Hebrew glyphs, and no metric is available to represent the heights of Latin glyphs?

Or are you saying that we make up a definition for "ex" height and "cap" height that would be generic enough to apply to all writing systems, and each one needs to decide what specifically that means, and the font would let us look up the "ex" height and "cap" height of each script independently so that, for example, if we are in a multi-script document trying to find the height of a Hebrew glyph we can ask for the "ex" height of Hebrew, and when we are trying to find the height of a lowercase Latin glyph we can ask for the "ex" height of Latin, and when we are trying to find the height of a Hangul glyph we can ask for the "ex" height of Hangul?

behdad commented 5 years ago

@fantasai sorry I didn't fully understand that part of your question before.

When we tell the font we're typesetting Hebrew, are you suggesting that it should tell us that the ex height = cap height = height of Hebrew glyphs, and no metric is available to represent the heights of Latin glyphs?

Yes and no? If you ask the font for Hebrew metrics, you get metrics relevant to Hebrew glyphs in the font. If you ask the same font for Latin metrics (assuming it has both encoded in the font), then you get metrics relevant to the Latin glyphs in the font.

Or are you saying that we make up a definition for "ex" height and "cap" height that would be generic enough to apply to all writing systems, and each one needs to decide what specifically that means, and the font would let us look up the "ex" height and "cap" height of each script independently so that, for example, if we are in a multi-script document trying to find the height of a Hebrew glyph we can ask for the "ex" height of Hebrew, and when we are trying to find the height of a lowercase Latin glyph we can ask for the "ex" height of Latin, and when we are trying to find the height of a Hangul glyph we can ask for the "ex" height of Hangul?

Yes. Correct.

fantasai commented 5 years ago

@behdad So I think we'd need more than just 'ex' and 'cap'. And we need to be clear that these metrics aren't equivalent to the existing ex and cap heights. Maybe 'low', 'base', 'top', 'high'? (Or some better words.) To scan through a handful of scripts...

For Latin that'd be the bottom of the g, alphabetic baseline, ex-height, cap-height. For Hebrew it'd be the bottom of nun/khaf, baseline, top of mem, top of lamed. For Tamil the base and top would match pa, and the high and low would maybe match to the top/bottom of the circular sweeps. For Thai the base and top would follow น, and the high would be the top of โ... I'm unsure about what the low would be. For Arabic, the base would be what beh sits on, the top how far it rises, the low the bottom of jim, and the high the height of lam. It would be... a bit meaningless on particularly calligraphic fonts, but these should be reasonable in more print-like typefaces. For Bengali, the baseline would be the bottom of ka, the top its top, the low/high would map to the top/bottom of vowel combinations. For CJK we'd collapse low/base to match the ideographic face bottom, assign the ideogaphic face top to top/high.

Four metrics would get the main ink lines one cares to align objects to. Most scripts really only need two, since the bits floating above the base and the top aren't strong enough visually to be aligning things to. And maybe we could collapse them down to just base/top. Or make them optional for some scripts if that's possible? But Latin and similar systems definitely need the four.

I'm also a bit concerned that this creates way too many metrics. Like, if we need to look up the ex-height of Coptic, and it's not there, presumably we look for the Latin ex-height as a fallback. If we don't define something, there's going to be an ad-hoc fallback system where a bunch of scripts don't have their metrics defined because they rely on a particular fallback... it'd be good to either build in some unification, or define canonical fallbacks so there's some agreement on what to look at.

Also it's worth pointing out the distinction of baselines (used to align text to each other) and theoretical glyph bounds. CJK's ideo baseline and idtp don't have the same usages as the character face top/bottom. I'm mainly looking for the theoretical glyph bounds here, which for the alphabetic baseline is the same as a baseline, but this isn't always the case.

behdad commented 5 years ago

Can we start from enumerating what's needed? "ex" as a length unit doesn't need customization. You brought up drop-cap alignment. What other metrics does CSS currently references?

fantasai commented 5 years ago

@behdad You're the one who suggested making equivalents of ex and cap for every script. :) I'm just trying to figure out how that would work.

Afaik the metrics we need here are:

  1. The top and bottom lines to match for drop caps.
  2. If there are other values needed for visual alignment or positioning, whatever metrics those might be. These would be used to control the visual distance between the text and other objects. This is a lower priority, since #1 probably handles most use cases.

Other metrics CSS needs include:

  1. Anything else typically used for alignment of disparate typefaces or type sizes within a single line. I would expect these to be in the base tables already, so probably not part of this issue.
  2. Superscript and subscript offsets and size factors; I believe these already exist in OpenType, they're just unreliable in fonts.
  3. Underline position/thickness; these already exist also. Overline and strike-through position might be nice to add. But again, not part of this issue.
behdad commented 5 years ago

Afaik the metrics we need here are:

  1. The top and bottom lines to match for drop caps.
  2. If there are other values needed for visual alignment or positioning, whatever metrics those might be. These would be used to control the visual distance between the text and other objects. This is a lower priority, since #1 probably handles most use cases.

Right. This is what I meant.

Other metrics CSS needs include:

  1. Anything else typically used for alignment of disparate typefaces or type sizes within a single line. I would expect these to be in the base tables already, so probably not part of this issue.

Correct.

  1. Underline position/thickness; these already exist also. Overline and strike-through position might be nice to add. But again, not part of this issue.

Strike-through exists. Overline doesn't.