Conversion T1 → OTF: how to assign Unicode code points?

tkacvins commented 2 years ago

I am working on making the Computer Modern fonts in Type 1 PFA format to OpenType/CFF format. One of my testers reported the /dotlessj is getting an incorrect Unicode code point. From FontForge

The glyph named dotlessj is mapped to U+F6BE.
  But its name indicates it should be mapped to U+0237.

The PFA encoding array is such that it ends up as a supplemental encoding since it is a custom encoding

The encoding array is

/Encoding 256 array
 0 1 255 { 1 index exch /.notdef put} for
dup 0 /Gamma put
dup 1 /Delta put
dup 2 /Theta put
dup 3 /Lambda put
dup 4 /Xi put
dup 5 /Pi put
dup 6 /Sigma put
dup 7 /Upsilon put
dup 8 /Phi put
dup 9 /Psi put
dup 10 /Omega put
dup 11 /ff put
dup 12 /fi put
dup 13 /fl put
dup 14 /ffi put
dup 15 /ffl put
dup 16 /dotlessi put
dup 17 /dotlessj put
dup 18 /grave put
dup 19 /acute put
dup 20 /caron put
dup 21 /breve put
dup 22 /macron put
dup 23 /ring put
dup 24 /cedilla put
dup 25 /germandbls put
dup 26 /ae put
dup 27 /oe put
dup 28 /oslash put
dup 29 /AE put
dup 30 /OE put
dup 31 /Oslash put
dup 32 /suppress put
dup 33 /exclam put
dup 34 /quotedblright put
dup 35 /numbersign put
dup 36 /dollar put
dup 37 /percent put
dup 38 /ampersand put
dup 39 /quoteright put
dup 40 /parenleft put
dup 41 /parenright put
dup 42 /asterisk put
dup 43 /plus put
dup 44 /comma put
dup 45 /hyphen put
dup 46 /period put
dup 47 /slash put
dup 48 /zero put
dup 49 /one put
dup 50 /two put
dup 51 /three put
dup 52 /four put
dup 53 /five put
dup 54 /six put
dup 55 /seven put
dup 56 /eight put
dup 57 /nine put
dup 58 /colon put
dup 59 /semicolon put
dup 60 /exclamdown put
dup 61 /equal put
dup 62 /questiondown put
dup 63 /question put
dup 64 /at put
dup 65 /A put
dup 66 /B put
dup 67 /C put
dup 68 /D put
dup 69 /E put
dup 70 /F put
dup 71 /G put
dup 72 /H put
dup 73 /I put
dup 74 /J put
dup 75 /K put
dup 76 /L put
dup 77 /M put
dup 78 /N put
dup 79 /O put
dup 80 /P put
dup 81 /Q put
dup 82 /R put
dup 83 /S put
dup 84 /T put
dup 85 /U put
dup 86 /V put
dup 87 /W put
dup 88 /X put
dup 89 /Y put
dup 90 /Z put
dup 91 /bracketleft put
dup 92 /quotedblleft put
dup 93 /bracketright put
dup 94 /circumflex put
dup 95 /dotaccent put
dup 96 /quoteleft put
dup 97 /a put
dup 98 /b put
dup 99 /c put
dup 100 /d put
dup 101 /e put
dup 102 /f put
dup 103 /g put
dup 104 /h put
dup 105 /i put
dup 106 /j put
dup 107 /k put
dup 108 /l put
dup 109 /m put
dup 110 /n put
dup 111 /o put
dup 112 /p put
dup 113 /q put
dup 114 /r put
dup 115 /s put
dup 116 /t put
dup 117 /u put
dup 118 /v put
dup 119 /w put
dup 120 /x put
dup 121 /y put
dup 122 /z put
dup 123 /endash put
dup 124 /emdash put
dup 125 /hungarumlaut put
dup 126 /tilde put
dup 127 /dieresis put
dup 128 /suppress put
dup 160 /space put
dup 161 /Gamma put
dup 162 /Delta put
dup 163 /Theta put
dup 164 /Lambda put
dup 165 /Xi put
dup 166 /Pi put
dup 167 /Sigma put
dup 168 /Upsilon put
dup 169 /Phi put
dup 170 /Psi put
dup 171 /sfthyphen put
dup 172 /nbspace put
dup 173 /Omega put
dup 174 /ff put
dup 175 /fi put
dup 176 /fl put
dup 177 /ffi put
dup 178 /ffl put
dup 179 /dotlessi put
dup 180 /dotlessj put
dup 181 /grave put
dup 182 /acute put
dup 183 /caron put
dup 184 /breve put
dup 185 /macron put
dup 186 /ring put
dup 187 /cedilla put
dup 188 /germandbls put
dup 189 /ae put
dup 190 /oe put
dup 191 /oslash put
dup 192 /AE put
dup 193 /OE put
dup 194 /Oslash put
dup 195 /suppress put
dup 196 /dieresis put
readonly def

/dotlessj has code point 180, so I don't know hwo that is getting translated to U+F6BE. Is there some special option to force what I was Unicode mapping tables?

Thanks,

Tom

tkacvins commented 2 years ago

These are the input file and output file from makeotf -f cmb10.pfa -o cmb10.otf -S

cmb10.zip

frankrolf commented 2 years ago

I think this conversion only really works by coincidence. The fact that A is at position 65 (which translates to x0041) will produce a range of useable characters – but many other code points will be off (for example everything before the exclamation mark).

Please consider using a GlyphOrderAndAliasDB file to map glyph names to code points.

Also, release mode (-r) is recommended to apply said GlyphOrderAndAliasDB file, and switch on subroutinization.

tkacvins commented 2 years ago

Hmmm, I wonder if it would be better if AFDKO handled custom encodings in a smarter fashion. Don't get me wrong, I like the product, but it is in the Type 1 and CFF specs to have custom encodings, so I am left wondering why AFDKO doesn't handle what is in the specifications in a better fashion. Besides, the fonts I am working with all have custom encodings, each different, some of which contain math glyphs (added to Unicode many years ago), etc. so it would be painful to make a GOADB for each font.

frankrolf commented 2 years ago

Okay, to clear up any possible misconceptions:

Unlike T1 fonts (PFA), OTFs don’t rely on a particular order of glyphs. Apart from the .notdef at position 0, glyphs can be in whatever position. Also, the order of characters prescribed in Unicode has no bearing on OTF fonts.

Consequentially, it is essential that your project have a GlyphOrderAndAliasDB. I do not really follow on why creating a GlyphOrderAndAliasDB is “painful” (not much more painful than making fonts to begin with ;-) )

All that said, you might be in luck, because the glyph names mostly seem to be contained in the aglfn: these are glyph names carried over from Type1 fonts, which have an inherent chararcter/codepoint prescribed, for example A or bracketleft. In the early days of OTF and Unicode, the aglfn was extended with new names, but it is an approach that simply doesn’t scale. Hence, the list has been kept as-is (without expansion). Here it is: https://github.com/adobe-type-tools/agl-aglfn/blob/master/aglfn.txt

Consequently, your GlyphOrderAndAliasDB could look something like this (a tab-separated two column list):

.notdef .notdef
Gamma   Gamma
Delta   Delta
Theta   Theta
Lambda  Lambda
Xi  Xi
Pi  Pi
Sigma   Sigma
Upsilon Upsilon
Phi Phi
Psi Psi
Omega   Omega
...

See a modern example for a GlyphOrderAndAliasDB here: https://github.com/adobe-fonts/source-serif/blob/main/Roman/GlyphOrderAndAliasDB

Even in this simple scenario, some questions are still open:

a bunch of glyph names are specified twice in your glyph list. Which one counts?
which Delta, Omega, mu are you looking for? Greek? Physics? Both?
what is suppress?
ff fi fl ffi ffl all exist, and for most of these there is (deprecated) Unicode code points. In general, ligatures are not supposed to represent their own code point, but be a combination of contained characters. fi and fl usually get code points applied (they are easily accessible from the macOS keyboard). Applying a code point to the remaining ones would only make sense if you expect your users to apply your font to text which contains these code points.

I hope this clears up some of your questions. I’ll update the title of this issue to reflect the nature of our conversation better.

tkacvins commented 2 years ago

Hi Frank,

At one point in time, when I last worked on the COmputer Modern fonts (I've since left the company when I did this work), there were no duplicates in the Type 1 font. I will touch base with the current maintainers of the Type 1 fonts to see what is going on with that.

The reason for the ligatures (ff, ffi, etc...) is that when the Computer Modern fonts were first designed, Type 1 fonts did not exist, and the machinery for compositing glyphs did not exist. These glyphs remain in place for compatibility reasons.

The Greek letters are meant for mathematics. There was a supplement to the Unicode specification to add math glyphs. I know the person responsible for the supplement, I will touch base with her to find out if adding Unicode support for the OTF flavor of the fonts should use Greek or Math/Physics.

The pain comment was just me be a little bit whiny. It is not more difficult (very likely less so) that actually designing the glyphs and hinting them.

Finally, thanks for the reference to AGLFN and an example of a GOADB. This is going to be an interesting/fun project.

Tom

adobe-type-tools / afdko

Conversion T1 → OTF: how to assign Unicode code points? #1432