Closed madig closed 8 months ago
I currently lack any knowledge on this topic. I would appreciate if others (perhaps @simoncozens, @davelab6, @tphinney, or anyone else who may know about this stuff) could give some feedback here to @madig's question.
Checking that it is “correct” is a big ask, but we could at least get close!
I was going to say something similar to Thomas. For point two, the "(list of glyphs)" might just be all mark glyphs; that's the simplest. And "(list of writing systems)" is all Indic scripts, all USE scripts, Khmer, Myanmar, and Hangul (because tone marks).
I wouldn't bother with point three. If you have all the anchors, you're probably going to put them in the right places, and if the end-user is seeing a dotted circle everything's gone wrong anyway.
"all Indic scripts, all USE scripts, Khmer, Myanmar, and Hangul (because tone marks)."
We'll need a fontbakery @condition
that detects these.
if the font supports one or more of (list of writing systems), check for anchors present in (list of glyphs). All those anchors should ALSO be present in dotted circle, FAIL if not
I can start prototyping a check for this that will run only on fonts that have dottedCircle.
Later we can improve it to check for the other things.
I'll give it the straightforward check-id com.google.fonts/check/dotted_circle.
And I'll place it on the universal
profile.
do we have sample font files for this?
This will be very helpful; many Noto issues are about missing U+25CC. FWIW, ufo2ft has lists of INDIC_SCRIPTS and USE_SCRIPTS.
And "(list of writing systems)" is all Indic scripts, all USE scripts, Khmer, Myanmar, and Hangul (because tone marks).
Since U+25CC is often used with marks for descriptive/pedagogical purposes (e.g. in The Unicode Standard itself), shouldn't "(list of writing systems)" be all scripts that encode combining marks?
And another one: https://github.com/googlefonts/noto-fonts/issues/2248
Since U+25CC is often used with marks for descriptive/pedagogical purposes (e.g. in The Unicode Standard itself), shouldn't "(list of writing systems)" be all scripts that encode combining marks?
Or, perhaps more practical: skip the whole notion testing for specific scripts / writing systems and, instead, require 25CC to be present and have appropriate anchors for any font that includes combining marks.
So, for example, a pure Latin-1 font wouldn't require anchors on 25CC, but a Latin font that includes marks (e.g., things from 0300 block) would.
There's some justification for this. And it makes things easier. But I'm not convinced that the 25CC glyph itself should be required for all scripts. Could we agree on:
?
But I'm not convinced that the 25CC glyph itself should be required for all scripts. Could we agree on:
- 25CC must be present for scripts where the shaper is going to try inserting one.
- If it is present, all marks should be able to attach to it.
Question: Are there scripts where shapers might insert 25CC in contexts unrelated to combining marks?
At any rate, we (SIL) currently require 25CC in any of our fonts that include combining marks but at this point we don't have a way to test whether appropriate anchors are present, so your test requirements as stated will be helpful to us.
Question: Are there scripts where shapers might insert 25CC in contexts unrelated to combining marks?
Nope. I went through the Harfbuzz source when compiling the list above.
so your test requirements as stated will be helpful to us.
Excellent. And I think it would work for Noto too.
@felipesanches Do you want to / are you working on this? I'm happy to implement it.
HarfBuzz will insert U+25CC for any combining mark at the beginning of text (provided that HB_BUFFER_FLAG_BOT
is set). So I’d be inclined to simply make it required everywhere. It is also provides a nice way to show standalone combining marks which only works if the marks and the circle come from the same font.
@felipesanches Do you want to / are you working on this? I'm happy to implement it.
feel free to do it ;-)
The basic initial check was implemented by @simoncozens and review/merged by me now.
Please open a followup issue for any of the additional behavior that may be proposed for this check.
Had an interesting discussion about this today that included @RosaWagner @vv-monsalve @felipesanches @m4rc1e ... I believe the agreement was as follows:
1) For fonts that meet the criteria of needing the dotted circle because they do one or more of the following:
Then Fontbakery should FAIL if EITHER (a) dotted circle is not present, OR (b) if it is present but the needed combining diacritics for those particular scripts do not attach to the dotted circle
2) If a font does not require dotted circle because of (1) requirements, but has one anyway, then Fontbakery should check that all combining diacritics attach to the dotted circle. (This will catch situations like IPA fonts that have a dotted circle.) If they do not, then WARN
3) Perhaps the IPA character set should include a requirement for dotted circle; but this question is independent of the above.
4) Perhaps Fontmake could have an option to automatically add a dotted circle for fonts that meet (1) above, but do not have one.
OK. For reference, the current check does:
If there is no dotted circle, FAIL if the font is a complex shaper font else WARN. If there is a dotted circle, FAIL if there are unattached marks.
You want:
If the font is a complex shaper font, FAIL if there is no dotted circle or if there are unattached marks. If not, WARN if there are unattached marks.
So all the code is there and it is just a matter of shuffling the if conditions around. :-)
Perhaps Fontmake could have an option to automatically add a dotted circle for fonts that meet (1) above, but do not have one.
Yeah done. :-) https://github.com/googlefonts/ufo2ft/pull/593
I'm a bit confused why Hangeul was included with this check as a FAIL as modern Hangeul use does not include any diacritics nor complex shaping.
Hangul technically does use complex shaping, ~but you're right that it doesn't need a dotted circle~. And the Hangul complex shaper does insert dotted circles.
Old Hangeul uses complex shaping and has diacritic marks that require dotted circles. Contemporary Hangeul does not.
In the complex shaper file you sent, it specifically mentions ljmo
, vjmo
, and tjmo
, which are not included in modern Hangeul fonts.
I think the check would be more precise if it made sure that it is looking at an Old Hangeul font versus a modern one.
Random thoughts
A secret project recently made me think of the dottedCircle (U+25CC) glyph. A quick search on the OpenType specification shows some protentially clarifying results somewhere, saying that Window's USE uses it for displaying defective clusters: https://docs.microsoft.com/en-us/typography/script-development/use#defective-clusters. I was wondering if anyone thought of some checklist to make sure this glyph works correctly for all the scripts that seem to make use of it, like Lao and Khmer? Would this be something one can automate in fontbakery? One particular issue is that in a UFO/fontmake-based workflow, the anchors in your dottedCircle have to match what you need for all the scripts you support, i.e. if you're missing a bottomright anchor, the circle will work only for marks that use other available anchor bases. Not sure if this needs a per-codepoint list of anchors and where they should attach?
Also, anyone know of any other uses?