MicrosoftDocs / typography-issues

Creative Commons Attribution 4.0 International
47 stars 21 forks source link

Deprecate OMPL and use Unicode Bidi properties #602

Open behdad opened 4 years ago

behdad commented 4 years ago

Currently in OpenType:

https://docs.microsoft.com/en-us/typography/opentype/spec/ttochap1#left-to-right-and-right-to-left-text

there's a document called the OMPL:

https://docs.microsoft.com/en-us/typography/opentype/spec/ompl

which is a copy of the BidiMirroring.txt from Unicode 5.1. OpenType only recognizes mirroring-pairs from that list. I suggest we remove that and reword OpenType to recognize mirroring pairs from latest version of Unicode supported by the implementation.

I can write down my arguments in favor if there is interest.

PeterCon commented 4 years ago

That was added in OT 1.6 based on consensus among several parties, including (IIRC) at least Adobe, Microsoft and Tiro Typeworks, on how to best ensure forward compatibility for fonts if additions were made to BidiMirroring.txt in future Unicode versions. That can certainly be revisited, but would need broad consensus.

Marking for future consideration.

tiroj commented 4 years ago

Background:

The purpose of OMPL was to draw a line between mirroring that would be applied at the character string level using BidiMirroring.txt, and mirroring that would be applied at the glyph run level using the rtlm or ltrm features. It was discussed with Unicode folk at the time, and the idea was that directional forms of glyphs would move to the font space, and that Unicode wouldn't add anything to BidiMirroring.txt that would involve mirroring at the character string level, since there would be the risk that this would then get reversed by glyph substitutions in rtlm and ltrm.

And then Unicode went and added new things to BidiMirroring.txt, so I guess we should have asked for a formal stability commitment, or given up on the whole idea of moving mirroring to the font space.

I'd like to hear your arguments, Behdad. This has implications not only for whether to retain OTLM, but also how the rtlm and ltrm features are defined.

behdad commented 4 years ago

That was added in OT 1.6 based on consensus among several parties, including (IIRC) at least Adobe, Microsoft and Tiro Typeworks,

I was in the phone meeting that this was decided. And I didn't agree with the decision. So wasn't consensus.

on how to best ensure forward compatibility for fonts if additions were made to BidiMirroring.txt in future Unicode versions. That can certainly be revisited, but would need broad consensus.

The decision was made based on dubious claims.

Basically this is what that decision made of the situation: characters are now of three kinds: 1) non-mirroning, 2) mirroring, with entry in BidiMirroring.txt as of some version of Unicode <= 5.1, 3) mirroring, added to BidiMirroring.txt as of some version of Unicode > 5.1

The problem is that characters in category 3, which are rare and few and less tested, will NOT render correctly in RTL contexts if the font designer didn't add rtlm lookups for them. The problem is, such rtlm lookups are NOT needed for characters in category 2. So, rtlm is quite unknown to most font designers. As such, characters in category 3 are highly likely to be broken in new fonts.

Now the argument then was a lot of handwaving and still is today. Here's my counter-argument to all "forward compatibility" concerns: HarfBuzz has been doing what I'm suggesting, forever, with no issues.

behdad commented 4 years ago

Background:

The purpose of OMPL was to draw a line between mirroring that would be applied at the character string level using BidiMirroring.txt, and mirroring that would be applied at the glyph run level using the rtlm or ltrm features.

One would think you know I know my bidi...

It was discussed with Unicode folk at the time,

At the time this happened people most known to be active re bidi issues at Unicode were Mark Davis, Roozbeh Pournader, and myself. So when you say "was discussed with Unicode folk at the time", was probably not discussed with the right Unicode folks.

and the idea was that directional forms of glyphs would move to the font space, and that Unicode wouldn't add anything to BidiMirroring.txt that would involve mirroring at the character string level,

That's non-sense. Unicode never makes such commitments unless doing so formally.

since there would be the risk that this would then get reversed by glyph substitutions in rtlm and ltrm.

Non-sense. If character-level mirroring happens, then rtlm won't be applied to that character's glyph.

And then Unicode went and added new things to BidiMirroring.txt, so I guess we should have asked for a formal stability commitment, or given up on the whole idea of moving mirroring to the font space.

Or give up on getting your head into bidi and shaping engine design and stick with things you know.

I'd like to hear your arguments, Behdad. This has implications not only for whether to retain OTLM, but also how the rtlm and ltrm features are defined.

Not really. As I said, this is what HarfBuzz has been doing forever.

tiroj commented 4 years ago

If character-level mirroring happens, then rtlm won't be applied to that character's glyph.

Where is that defined and at what level is it implemented?

[Not trying to be difficult, just wanting to understand what the model is going forward if we ditch OMPL and its assumptions and actually standardise — like a written spec — how this is to be handled.]

PeterConstable commented 4 years ago

Reminder:

Be friendly and patient... Be respectful... Understand Disagreements... resolve differences constructively.

Microsoft Open Source Code of Conduct

behdad commented 4 years ago

If character-level mirroring happens, then rtlm won't be applied to that character's glyph.

Where is that defined and at what level is it implemented?

In the rtlm feature description:

"""Function: This feature applies mirrored forms appropriate for right-to-left text other than for those characters that would be covered by the character-level mirroring step performed by an OpenType layout engine. (For right-to-left glyph alternates, see 'rtla'.)"""

That's the difference between rtlm (automatic, only applied if char-level mirroring did not happen) vs rtla always on in RTL context.

tiroj commented 4 years ago

Thanks. It seems to me that the rtlm feature is ambiguous in this respect, because you are reading it as a kind of filter on which characters to apply the feature to, and I've always read it in light of that meeting and the existence of OPML, i.e. that rtlm should be used for mirroring characters not in OPLM which are the ones that would be covered by character-level mirroring.

I think it would be a good idea for the rtlm and ltrm features to be more carefully and fully defined in how they relate to UnicodeMirroring.txt in the same way as we redefined the init, medi, etc. features in terms of ArabicShaping.txt.

What you are doing in HarfBuzz does sound like a good approach, as well as reflecting a legitimate reading of the feature description as presently written. But 'what HarfBuzz has been doing forever' is not in itself standardisation, and the feature description should be less ambiguous and less open to other readings.

behdad commented 4 years ago

Thanks. It seems to me that the rtlm feature is ambiguous in this respect, because you are reading it as a kind of filter on which characters to apply the feature to, and I've always read it in light of that meeting and the existence of OPML, i.e. that rtlm should be used for mirroring characters not in OPLM which are the ones that would be covered by character-level mirroring.

Not sure it can be read the way you are. In all shaping engines, character-level mirroring precedes glyph-level lookups. It is clear that rtlm is designed to mirror glyphs for characters that need mirroring and are not mirrored at character-level. I don't see how else it can be interpretted.

Now what we are discussing is whether the set of characters that get character-level mirroring should be frozen in time, or move forward with Unicode. That's the only think this issue is about.

I think it would be a good idea for the rtlm and ltrm features to be more carefully and fully defined in how they relate to UnicodeMirroring.txt in the same way as we redefined the init, medi, etc. features in terms of ArabicShaping.txt.

They are carefully defined. It's just that the BidiMirroring.txt was frozen in time. I'm proposing that we unfreeze it and roll it forward like we do with ALL other Unicode data files.

Actually I'm glad you mentioned init etc. Imagine if we froze the ArabicShaping.txt used for init, medi, etc to one from Unicode 5.1 and said for any newer characters, the font should do contextual analysis instead of the relying on the shaping engine for that... That's the analogous situation here.

What you are doing in HarfBuzz does sound like a good approach, as well as reflecting a legitimate reading of the feature description as presently written. But 'what HarfBuzz has been doing forever' is not in itself standardisation,

I wasn't implying that it is. I was saying that I've been doing this for 10+ years with no issues.

and the feature description should be less ambiguous and less open to other readings.

tiroj commented 4 years ago

I agree with the idea of unfreezing the version of BidiMirroring.txt, but if this doesn't involve revision of the rtlm and ltrm feature descriptions themselves, then it definitely does require revision of the Advanced Typographic Extensions spec referenced in those feature descriptions. I support the idea of changing that text—ironically, one of the few parts of OTL that actually has an implementation spec with an algorithm description—to reflect what HarfBuzz does, i.e. apply Unicode mirroring as defined in BidiMirroring.txt, then apply rtlm/ltrm features only to glyphs that have not already been mirrored in the previous step, then apply rtla/ltra features to the whole run. Is that right?

behdad commented 4 years ago

I agree with the idea of unfreezing the version of BidiMirroring.txt, but if this doesn't involve revision of the rtlm and ltrm feature descriptions themselves, then it definitely does require revision of the Advanced Typographic Extensions spec referenced in those feature descriptions.

Right. The only update needed is to replace OMPL with BidiMirroring.txt of latest supported Unicode version by the engine.

e. apply Unicode mirroring as defined in BidiMirroring.txt, then apply rtlm/ltrm features only to glyphs that have not already been mirrored in the previous step, then apply rtla/ltra features to the whole run. Is that right?

ltrm is also applied to everything, since if the run is LTR there's no character-level mirroring, so everything gets ltrm enabled. The only other thing I point out with your statement is that there's no "then apply...". The features are applied together, so the lookup order allows font designer to choose what order to change things.

But yes. That's what I'm proposing. Thanks for bearing with me John. I apologize for my impatience.

tiroj commented 4 years ago

The only other thing I point out with your statement is that there's no "then apply...".

You mean there's no 'then apply' in terms of the OTL features, right? The Unicode mirroring still has to be first and then apply the OTL feature lookups, because the former filters which glyphs get processed by the latter. Just wanting to make sure I've got this totally clear.

Thanks for bearing with me Behdad. I apologise for my deliberate slowness. :)

behdad commented 4 years ago

You mean there's no 'then apply' in terms of the OTL features, right? The Unicode mirroring still has to be first and then apply the OTL feature lookups, because the former filters which glyphs get processed by the latter. Just wanting to make sure I've got this totally clear.

Correct. All I meant was that rtlm/ltrm and rtla/ltra are applied in tandem, NOT one after the other.

khaledhosny commented 4 years ago

That's the difference between rtlm (automatic, only applied if char-level mirroring did not happen) vs rtla always on in RTL context.

I have to say I finally understood why we need both rtlm and rtla!

I just created a test font with rtlm feature that mirrors parentheses and both HarfBuzz and Uniscribe do not double mirror them (correct) while CoreText double mirrors (broken). So I agree that the spec need to be more explicit here.

behdad commented 4 years ago

while CoreText double mirrors (broken)

Ouch. Can you report through Apple system or email @nedley?

behdad commented 4 years ago

@khaledhosny while testing, can you please test whether Uniscribe applies rtlm at all? I remember for many years they didn't...