Open dpk opened 8 years ago
coming here from https://github.com/edwardtufte/et-book/issues/22 I really badly miss my ć…
Here’s an initial assessment (based on running PyFontaine on the Roman font and reading through the first, like, 5% of its extremely voluminous output) of which characters would be easiest to work on, and which would give the most benefit in terms of number of languages (but alas, not number of speakers) supported. Where PyFontaine only picked up a capital or small letter as missing, it’s safe to say it would be a good idea to have both if we don’t already.
Edit: Superseded (but see the HTML comments if you’re interested), see below.
We can use <component />
elements in the UFO glyph outline to build accented characters out of base characters + glyphs, like as follows:
<?xml version='1.0' encoding='UTF-8'?>
<glyph name="Adieresis" format="2">
<advance width="414"/>
<outline>
<component base="A" />
<component base="dieresis" xOffset="180" yOffset="222" />
</outline>
</glyph>
There’s already an Adieresis character in the fonts, of course, this is just an example. But I used it to test where the diacritics should go relative to the base font. The values of 180 and 222 for xOffset
and yOffset
make the dieresis appear exactly where it does above the A as in the Roman font’s real, original Adieresis character (tested by layering a blue version of this character directly over a red version of the original Adieresis). Further, when I change dieresis
to macron
or caron
or tilde
, etc., it appears to put the accent marks in the right place, horizontally centred over the base character (because the accent characters are all the same width).
For the lower-case adieresis, building a + dieresis with xOffset and yOffset both 0 (the default) matches the original, but this is almost certainly not true for all base characters.
With the above in mind, here are the correct xOffset
and yOffset
values for characters built from the following bases plus accents above the character in question, calculated using dieresis as the combining character except where noted (and therefore possibly incorrect for others, but this will eventually be checked and noted):
Edit: Superseded, see doc/accent-positions.md
I wish I had a quicker way of finding them out …
I guess the benefit is, once I have the offset values for all fonts, I can write a script that will generate the glif
files for any combination on demand automatically …
Also need to find good placements for the accent marks over characters like æ, œ, r, and w, for which some languages need accent mark versions, but for which there are no accented versions in the original ET Books fonts. There are also no accented small-cap s or z characters.
The lowercase values likely work for some accents under the letter as well: one can, at least, build an extremely passable s-cedilla out of s + cedilla with the values given above. c + cedilla (offset 33, -10) doesn’t quite match ccedilla, but offset 33, 0 for c + acute looks reasonable.
a + ogonek looks okay-ish with ogonek at offset 155, -10 ish. (I’m assuming -10 as the yOffset
for all accents like cedilla which hang from, and are attached, under the character in question.) I suspect we’re not going to get better than okay-ish unless someone felt like coming in and designing an original aogonek character. Also, as I don’t read any languages which use the ogonek, I’m not really qualified to judge how good it looks in practice — I’m just comparing at large scale to a couple of serif fonts I have.
As a goal for what languages to support, it would be nice to support all the Latin-scripted languages of the European Union (that is, all of them except Greek and Bulgarian).
Okay, after some moderately successful bodging of characters in FontForge today, I think I’m ready to upgrade what I hope can be achieved from ‘Latin-scripted languages of the European Union’ to also include ‘Latin-scripted languages promoted by the European Charter for Regional or Minority languages’. Here’s a quick overview of what characters are needed, according to PyFontaine.
Support for any individual character is likely to come to the Roman font only first, then to the bold weights, then to italic, then only maybe to Display Italic. (I haven’t decided yet whether I’ll even keep maintaining Display Italic.)
Already fully supported.
Will not be supported — uses Cyrillic script.
Č č Ć ć Đ đ
Č č Ď ď Ě ě Ň ň Ř ř Ť ť Ů ů
IJ ij
It should be okay to specify these characters by OpenType positioning, I hope.
Will not be supported — uses Greek script.
Ő ő Ű ű
Ā ā Č č Ē ē Ģ ģ Ī ī Ķ ķ ļ Ņ ņ Ū ū
Ą ą Č č Ė ė Ę ę Į į Ū ū Ų ų
Ċ ċ Ġ ġ Ħ ħ Ż ż
Ą ą Ć ć Ę ę Ń ń Ś ś Ź ź Ż ż
Ă ă Ș ș Ț ț
Č č Ď ď Ĺ ĺ Ľ ľ Ň ň Ŕ ŕ Ť ť
Č č
All those not mentioned should either already be fully supported, or non-Latin, or (rarely) should be automatically covered when other, related languages are covered.
Will not be supported — uses Syriac script.
Will not be supported — uses Arabic script.
Will not be supported — uses Armenian script.
Will not be supported — uses Cyrillic script.
See Croatian.
Also Valencian.
Ŀ ŀ
See Turkish.
Most of these characters are pretty tricky. Maybe give up and don’t support this one.
Ė ė Ƣ ƣ Ꞑ ꞑ Ɵ ɵ Ś ś Ş ş Ь ь Ž ž Ź ź Ƶ ƶ
Ą ą Ã ã Ń ń Ż ż
Ş ş
Probably already supported.
Will not be supported — use Cyrillic script.
Č č
Will not be supported — uses Cyrillic script.
See Slovakian.
Probably supported when we support all the other characters.
Č č Đ đ Ǧ ǧ Ǥ ǥ Ǩ ǩ Ŋ ŋ Ŧ ŧ Ʒ ǯ Ǯ ʒ ʹ
Ć ć Č č Ě ě Ń ń Ř ř
Ć ć Č č Ě ě Ń ń Ŕ ŕ Ś ś Ź ź
Looks complicated due to multiple competing orthographies with unclear legal statuses.
Ğ ğ İ Ş ş
Will not be supported — uses Cyrillic script.
Ŵ ŵ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ŷ ŷ Ỳ ỳ
See Kurdish.
Will not be supported — uses Hebrew script.
Char. | Langs |
---|---|
č | 10 |
Č | 10 |
ń | 4 |
Ń | 4 |
ć | 4 |
Ć | 4 |
ż | 3 |
Ż | 3 |
ě | 3 |
Ě | 3 |
ą | 3 |
Ą | 3 |
ź | 2 |
Ź | 2 |
ū | 2 |
Ū | 2 |
ť | 2 |
Ť | 2 |
ş | 2 |
Ş | 2 |
ś | 2 |
Ś | 2 |
ř | 2 |
Ř | 2 |
ŕ | 2 |
Ŕ | 2 |
ň | 2 |
Ň | 2 |
ę | 2 |
Ę | 2 |
đ | 2 |
Đ | 2 |
ď | 2 |
Ď | 2 |
ỳ | 1 |
Ỳ | 1 |
ẅ | 1 |
Ẅ | 1 |
ẃ | 1 |
Ẃ | 1 |
ẁ | 1 |
Ẁ | 1 |
ʹ | 1 |
ʒ | 1 |
ț | 1 |
Ț | 1 |
ș | 1 |
Ș | 1 |
ǯ | 1 |
Ǯ | 1 |
ǩ | 1 |
Ǩ | 1 |
ǧ | 1 |
Ǧ | 1 |
ǥ | 1 |
Ǥ | 1 |
Ʒ | 1 |
ŷ | 1 |
Ŷ | 1 |
ŵ | 1 |
Ŵ | 1 |
ų | 1 |
Ų | 1 |
ű | 1 |
Ű | 1 |
ů | 1 |
Ů | 1 |
ŧ | 1 |
Ŧ | 1 |
ő | 1 |
Ő | 1 |
ŋ | 1 |
Ŋ | 1 |
ņ | 1 |
Ņ | 1 |
ŀ | 1 |
Ŀ | 1 |
ľ | 1 |
Ľ | 1 |
ļ | 1 |
ĺ | 1 |
Ĺ | 1 |
ķ | 1 |
Ķ | 1 |
ij | 1 |
IJ | 1 |
İ | 1 |
į | 1 |
Į | 1 |
ī | 1 |
Ī | 1 |
ħ | 1 |
Ħ | 1 |
ģ | 1 |
Ģ | 1 |
ġ | 1 |
Ġ | 1 |
ğ | 1 |
Ğ | 1 |
ė | 1 |
Ė | 1 |
ē | 1 |
Ē | 1 |
ċ | 1 |
Ċ | 1 |
ă | 1 |
Ă | 1 |
ā | 1 |
Ā | 1 |
ã | 1 |
à | 1 |
Croatian
Č č Ć ć Đ đ
you're missing
Š š Ž ž
(i think Slovene should have the same set, but i'm not familiar with it. even though it's a South Slavic language, to my ears it sounds like West Slavic language)
Turkish
Ğ ğ İ Ş ş
you're missing
Ö ö Ü ü
this throws your count off
All those characters are already in the font — I’m only counting ones that aren’t there already. Thanks for double checking!
Use FontForge's “build accented glyph” feature to fill in gaps in the Unicode repertoire. There should be enough character+glyph combinations possible for most European languages …