Closed davelab6 closed 1 year ago
All it requires is a simple tool which adds a "3.10" cmap subtable which maps glyph ids to the PUP A (sequentially, by adding 0xF0000 to the glyph ID). Because the codepoints will be in the same order as the glyph IDs, you can use the space-saving cmap format 12 which only defines the start and end of the cmap mapping range. So the added size overhead is small.
@behdad Is there any way to define that glyph is unencoded using fontTools?
@hash3g you can find the glyph IDs in the GlyphOrder
table, eg https://github.com/hash3g/yesevaone/blob/master/YesevaOne-Regular.ttf.GlyphOrder.ttx and you can find glyphs that are encoded in the cmap
table. I guess you need to make a set of the glyph names and a set of the encoded glyph names and compare them to get the set of unencoded glyphs.
I would advocate that it would be much simpler (and more storage-effective) if you just encode ALL glyphs as U+F0000 + GID.
This has the advantage that cmap subtable format 12 uses an efficient storage for continuous code-to-GID ranges. With my method, you'll only create one such range, so it'll only add a few bytes to the size, and will be very fast.
This approach has an additional benefit: As a user of such font, I am are not forced to address the properly (i.e. via Unicode) glyphs encoded using the F0000+ codes.
I can still use the proper Unicodes. But if I do so, the browser/app will always perform the Unicode processing and default OpenType Layout shaping for complex scripts. So I won't really have the guarantee that the glyph I'm seeing is actually the glyph assigned to the Unicode codepoint in the font's cmap. It will be for most Unicodes but for some codepoints, the "Unicode+OTL magic" will kick in.
But if I address even the "properly" encoded glyphs using the U+F000+ codepoint, I will have a WYSIWYG guarantee. Even more: with harfbuzz.js, I can run a JS port of HarfBuzz in the browser, take the output GIDs, add F000+ to them and have my own explicit custom OTL processing if I need to. So I'm completely in control and independent of any "browser magic".
Here is my code that does exactly what I described above.
#! /usr/bin/python
# -*- coding: utf-8 -*-
#
# pyftaddspuaabygids.py
# Map all glyphs to the Supplementary PUA-A plane (U+F0000..U+FFFFF)
# by 0xF0000 + glyphID
#
# Copyright (c) 2014 by Adam Twardoch
#
# Licensed to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import fontTools.ttLib, sys, copy
def addSPUAByGlyphIDsMappingToCMAP(ttx):
cmap = ttx["cmap"]
# Check if an UCS-2 cmap exists
for ucs2cmapid in ((3, 1), (0, 3), (3, 0)):
ucs2cmap = cmap.getcmap(ucs2cmapid[0], ucs2cmapid[1])
if ucs2cmap:
break
# Create UCS-4 cmap and copy the contents of UCS-2 cmap
# unless UCS 4 cmap already exists
ucs4cmap = cmap.getcmap(3, 10)
if not ucs4cmap:
cmapModule = fontTools.ttLib.getTableModule('cmap')
ucs4cmap = cmapModule.cmap_format_12(12)
ucs4cmap.platformID = 3
ucs4cmap.platEncID = 10
ucs4cmap.language = 0
if ucs2cmap:
ucs4cmap.cmap = copy.deepcopy(ucs2cmap.cmap)
cmap.tables.append(ucs4cmap)
# Map all glyphs to UCS-4 cmap Supplementary PUA-A codepoints
# by 0xF0000 + glyphID
ucs4cmap = cmap.getcmap(3, 10)
for glyphID, glyphName in enumerate(ttx.getGlyphOrder()):
ucs4cmap.cmap[0xF0000 + glyphID] = glyphName
def usage():
print "Map all glyphs to the Supplementary PUA-A plane (U+F0000..U+FFFFF) by 0xF0000 + glyphID"
print "python %s inputfile[.otf|.ttf] outputfile[.otf|.ttf]" % sys.argv[0]
if len(sys.argv) == 3:
inpath = sys.argv[1]
outpath = sys.argv[2]
ttx = fontTools.ttLib.TTFont(inpath, 0, verbose=0)
addSPUAByGlyphIDsMappingToCMAP(ttx):
ttx.save(outpath)
ttx.close()
else:
usage()
I categorically reject this and think it's a bad idea. Nowhere in this report I see any reasoning for why this is needed or is a good idea.
Ah, yes. We talked with Dave about this. Sorry it didn't become clear.
The idea is not to do this for production-ready fonts but for the purpose of development, to be used within the context of document-driven type design and similar such applications.
In a way, think of it as the "debug" mode of building fonts. Such debug mode might include other options that generate some redundant data (such as, well, glyph names! :) ) which is useful while designing but when building fonts in "release" mode, this stuff should not be included.
Ok, sure. Yeah, that would be useful.
@behdad could you explain more about why you think this is a bad idea..? You think that if all Google Fonts have this feature, that it will increase the use of PUA characters and documents tightly bound to particular fonts in general usage?
@davelab6 for the same reasons that non-Unicode encodings are bad. This is even worse, this is full custom encoding, meaning any text encoded in those is illegible to any text processing use.
@davelab6 please check that test and fix is applied
https://github.com/googlefonts/fontbakery-cli/commit/5acd915d47e9385ef529be646906790411bd731d
@behdad I am skeptical that this would find any general usage.
It its a secondary method that is not for text processing, but debugging: it is supplementing, not replacing, the unicode encodings and OTL tables.
Part of Document Driven Type Design is having good examples to refer to; specifically for the re-implementation of http://fuelproject.org/utrrs/index (which is the result of a 24-hour overnight sprint, but the concept is valid and needed.)
Since we don't have OTL processing in <canvas>
, I figure this secondary encoding would be the best way to get that done. And the good examples will be in the production Fonts API.
@hash3g for now, can you make this optional in the same way as fontcrunch is optional, via bakery.yml
and set up page?
Per TypeThursday's Laura Worthington article we should consider this, perhaps only for display fonts, if its become important for casual users of desktop fonts.
What's this article you are referring to?
Oh its not out yet. Stay tuned.
ok. I hope you don't want to revive the idea of using PUA codes in released fonts...
Oh its not out yet. Stay tuned.
lol. Ping us when it is. That said, people have had bad ideas forever; doesn't mean we should support them. I'm more willing to implement a HarfBuzz tool to render arbitrary glyphs than to add a hack in fonttools.
The issue is that a lot of text environments that users are using do not support OTL; for text typefaces this isn't really an issue, but for display types then users really do want that particular glyph from that particular font, and for it to work everywhere. When composing a 3 word text, the scenarios where PUA text doesn't make sense are less important.
@anthrotype @behdad the article is at https://medium.com/type-thursday/casual-users-and-the-font-market-a1c5c2f19149
I used to be anti-PUA for a long time but now Iām more inclined to support it, because the software makers have failed to support OpenType Layout for the last 15 years.
However, Iām more inclined to use the Supplementary PUA plane because it creates another āsoftā hurdle.
Iām not much helped by PUA-less fonts if OpenType gets supported widely 20 years after Iām dead :)
BTW, instead of PUA, Iād actually be happier to produce āfeature-frozen TTCsā that at least expose the most important 1:1 GSUB font features (small caps, stylistic sets, oldstyle nimerals) via the cmap as supplementary font menu items (Source Han does this). Itās a cleaner solution than PUA, with little overhead, and it seems that most modern OSes and apps do support TTCs. CFF-based TTCs have much less backwards compatibility than TT-based TTCs, though.
The problem with TTCs though is that hardly any commercial font distributor is equipped to carry/sell them.
Of course I understand why Behdad categorically (LOL) rejects this. It essentially undermines the last ten years of his work. :) With HarfBuzz, Behdad actually has made OpenType Layout much more attainable towards ānormalā software developers, and from a āhobby cloneā project has turned HB into being on par with the reference inplementation, or at times, surpass it.
The actual problem is actually the primitive UX behind font selection and glyph input. So perhaps PUA is really just a silly little fix that offers comparatively little benefit, while the downsides with overusing it are potentially large.
I can see the case for pyfeatfreeze based TTFs... but will TTCs work for users of https://en.wikipedia.org/wiki/Creative_Writer or https://en.wikipedia.org/wiki/3D_Movie_Maker or whatever it is kids use these days? And what about OTFs?
Microsoft has been shipping Cambria only as TTC since 2007, and numerous Chinese fonts well before that. All their APIs support TT-based TTCs well. Only a small fraction very obscure apps that do their own font handling might have a problem. Cambria is a major font, default in many documents. I've never heard any problems related to this, not in printing or any noteworthy apps. Windows APIs handle this completely transparently. CFF--based TTC support is newer on Windows, not sure how new.
OS X handles TTCs (both TT and CFF) since at least 10.7, maybe older. The OS X upgrade rate is much faster given thir upgrades were always cheap or, recently, free. Most Apple system fonts are TTC, some CFF-based, so Apple must be confident their APIs handle it well.
By CFF-based TTC, I mean OTF-based. They have the same extension, .ttc.
I agree, although the upgrade rate is slowed because of hardware requirements; there are some minority of users stuck on 10.6 old hardware (and I believe some are stuck out of choice thanks to allegiance to old 'trusty' FL5 versions... :)
I wonder what Laura thinks.
@kenlunde might know what repercussions of shipping CFF-based TTCs are, given Adobe's decision to ship Source Han as CFF-based TTCs but also in several split forms.
On 8 February 2016 at 15:00, Adam Twardoch notifications@github.com wrote:
However, Iām more inclined to use the Supplementary PUA plane because it creates another āsoftā hurdle.
What does this look like for the newbie customer described in the interview?
Ps. If Lato had small caps or another big 1:1 feature, I would have shipped it as featfrozen TTCs from day 1. But the number of OT glyphs in Lato is tiny so I didn't bother. If we ever add small caps, I'll make the official release as TTCs.
@twardoch: The only downside to shipping OTCs is that Windows doesn't support them. Adobe apps, CS6 and higher, installed on Windows support OTCs, but the font resource needs to be installed into the app's private font folder. OS X started to support OTCs from Version 10.8, if memory serves. It's a superb delivery format for Source Han Sans.
Laura is a somewhat special case. She makes calligraphic fonts where only 30% or less of the glyphs are encoded. People pay for her fonts, then cannot use them in some contexts.
The only opensource font of that kind that comes to my mind is Pecita, where users might want to insert glyphs individually. SPUA should not make it difficult to achieve the double-click-on-glyph-palette glyph insertion. But it'd be easier to track the SPUA character codes in documents in case someone wants to cleanse them later.
The featfrozen TTC method, on the other hand, would give millions of LibreOffice/OpenOffice users access to full-blown small caps (also Microsoft Office!), and to other major features (oldstyle figures, significant stylistic sets).
I personally consider featfrozen TTCs not a hack, because they don't "break" anything. The user just changes the font in the menu, like going to an italic.
The reason I contributed my SPUA code was mostly to make special font versions for use in webfont specimens, or glyphmaps, where you really want to "look", display non-printing glyphs, or override script-specific shaping. That's what I do myself and how I use it. I never shipped an end-user font like this.
Suggestion: the OFL version of the Noto fonts should not use SPUA, but the Apache 2 versions might, for all glyphs ;)
Ah, so Windows still does not support CFF-based TTCs. Bugger :( (Thanks, Ken!)
Since this is about naieve users of crappy Windows apps (you know, the ones they use to make T shirt designs and Blurb self-publishing books and so on) then the lack of Windows support makes this a non-starter. However, perhaps OTC isn't so important.
Of course I understand why Behdad categorically (LOL) rejects this. It essentially undermines the last ten years of his work. :)
That's not the reason. It's "this is font-dependent text encoding" and hence BAD. But I see your point.
its become important for casual users of desktop fonts.
How about making existing apps (e.g. LibreOffice) support OT features?
This isn't about libre software, it is about proprietary software that will never improve.
Way to give up Dave. Make it better than proprietary as Stallman said and ppl will use it.
On 10 February 2016 at 04:09, Adrien TĆ©tar notifications@github.com wrote:
Way to give up Dave. Make it better than proprietary as Stallman said and ppl will use it
Now who is wearing the GNU T shirt? :)
It's "this is font-dependent text encoding" and hence BAD
This reminds me of XHTML vs HTML5. I understand why it is offensive; it is a definitive example of a dirty hack.
But an insistence of technical purity hurts users from getting their jobs done and getting on with their lives.
And, this is only and merely a fallback for when OT isn't available.
Now who is wearing the GNU T shirt? :)
I'm talking to you with your own words. Look, existing font editors didn't work for me so I made one.
You seem to still be missing my point: New software doesn't matter. This is about entrenched existing proprietary software, which can not be patched, or switched away from.
Make it better than proprietary as Stallman said and ppl will use it
Stallman doesn't say this; he says even if a libre software alternative is worse, it ought to be used because restricted software is an injustice, and categorically worse than inconvenience of a poor quality program; and in fact he says that if people are switching to libre software because it is more powerful and convenient (say, as they do with VLC) without understanding the difference in justice, they have missed his point.
I am concerned with liberating typography for all people, not only those who disagree with him and prefer convenient restricted software to inconvenient libre software - such as a simple layout application that doesn't handle OpenType but allows them to mock up T shirt designs that is integrated with a T shirt printing firm's production and mailing operations, versus Inkscape or Scribus or whatever that requires them to learn those applications UIs, theory about Unicode and OpenType feature processing, and negotiating with the T shirt vendor to accept their EPS.
Conceptually, I'm with Dave insofar that fonts in the SFNT container are an unusually broad in the sense of OS and app coverage and lifespan. Also, people are more likely to switch fonts than apps because apps are, well, more specific to their needs.
Also: text encoding is not an "easy" thing. There are no perfect solutions. Since 2001 I kept asking the industry how OT features should be marked up in "rich text" and the answer came in 2011 with CSS font-feature-settings. That's because it wasn't obvious at all. And it still isn't perfect (in order to get just one different glyph, you needs to surround the character with a HTML span and apply a super-long CSS property, but this will likely segment the text run so that isolated character stops interacting with the rest of the line OT-wise.
It's shit. :)
how OT features should be marked up
This is a great point, I've added it to the wikimedia page, https://meta.wikimedia.org/w/index.php?title=Future_Global_Font_Format_Requirements&type=revision&diff=15336305&oldid=14437375
Can you tell us more about supplementary PUA?
Supplementary Private Use Area-A is 65534 codepoints from U+F0000 to U+FFFFD, which means that practically all fonts can be encoded (as they can hold max. 64K glyphs). My idea to use it was that each glyph ID gets assigned a codepoint F0000 + GID, so something like āharfbuzz.jsā could do all the GSUB+GPOS processing even if a browser doesn't support some shaping, and then such harfbuzz.js would emit a series of GIDs, and in order to display them, F0000 is added to each resulting GID and then the fontās cmap gets requested for such codepoint, which in turn let the browser actually display these glyphs. Since the F0000+ codepoints in the font corresponds exactly to the GID order, this can be stored very efficiently in the fontās cmap format 12, adding only a few bytes to the font regardless of its size.
The BMP PUA (E000-F8FF) only allows for 6400 codes so it wouldn't be enough for some fonts. Plus some fonts have a legitimate ācorporateā usage of the BMP PUA (SIL, MUFI), while SPUA is really obviously ālast resortā.
Ps. In my method, ALL glyphs are encoded in the SPUA, not just the āunencodedā ones. But if the SPUA codes are used, the OS/app shaper does not apply any shaping since it knows nothing of the script. This allows me to do my own external shaping (e.g. via said harfbuzz.js or another JS shaper). So in essence, this is a simple poor manās API for glyph access in apps that can only talk to fonts via the cmap. It a decdnt web implementation, those SPUA codes could live in some shadow DOM or something, while proper selectable/searchable text woul live āon topā.
...or via CSS generated content. People dealing with icon fonts (Bootstrap using Font Awesome etc.) have been using PUA in an elegant way that does not expose garbage to search. But itās still fonts rather than SVG, so itās fast, cacheable, works everywhere etc.
It's essentially a debate about treating users as smart vs. stupid. I prefer to treat users as reasonably smart. Users obviously prefer to just type their text on the keyboard, and that'll use normal Unicode. They will resort to some glyph palette insertion or some fancy PUA codes only if they're really ādesperateā i.e. when they really have no other choice.
They will resort to some glyph palette insertion or some fancy PUA codes only if they're really ādesperateā i.e. when they really have no other choice.
Right, that is why Laura uses the BMP PUA, and I agree that this would be better for her (and generally.)
One remaining question for me is if this should be done for all fonts or only OT-intense display fonts.
I donāt have an opinion on that. However, I want to add one thing: in early OpenType days, Adobe used a portion of the BMP PUA as a ācorporate use areaā, where they standardized certain codes for things like small caps, oldstyle numerals or certain ligatures. So, an oldstyle ā3ā orva smallcap āAā always had a certain code regardless of the Adobe font used. Now that was a bad idea because this practice created an illusion that these codes had some claim of universality, or longtime relevance. So they stopped using PUA after a few years.
But with purely font-specific encoding, I donāt see this as a problem. If you have a series of SPUA codepoints assigned to correspond to GIDs in a specific font, then everyone agrees that no machine āknowsā, or is expected to know, what any of these codepoints āmeanā. As long as all sides agree that no presumptions can be made, I think itās fine.
In Adobeās case U+F761 was semi-standardized as āsmallcap Aā, and all their early OTFs used U+F761 as small-cap A, so the danger was that some apps might start expecting that U+F761 just āmeans smallcap Aā. But with the purely GID-oriented SPUA, U+F0761 will mean something else with every single font. So it really is āprivateā, and substituting fonts will yield unecpected results.
Which is fine because users will more likely not except any stability of this encoding and will use it mostly as an input mechanism for specific glyphs in very specific situations ā often with the goal being print, or laser cut, or automated engraving. Most of these laser cut or engraving apps have no OT features UI and never will be.
So SPUA entry may be the only way for the user to get work done. If the user could find a better method, theyād already be using it.
As long as all sides agree that no presumptions can be made, I think itās fine.
I worry that glyphs/fontmake might create predictable map from common unicodes to GID ordering...
I was chatting with @twardoch today about how glyph names are the 'primary key' for fonts, because in any contemporary font you have so many unencoded glyphs, accessed with OpenType Layout logic... But unencoded glyphs are tricky to precisely call, because OTL logic is per-font. I mentioned that I might like to use the Unicode Private Use Area to encode otherwise-unencoded glyphs.
Adam kindly mentioned he already thought about this, and he concluded that the Private Use Plane A (Unicode Plane 15) is ideal for this, as its
U+F0000..U+FFFFD
so you can use a value ofF0000 + hex(GID)
to cleanly, logically, encode all unencoded glyphs.Let's do it!