adobe-fonts / source-serif

Typeface for setting text in many sizes, weights, and languages. Designed to complement Source Sans.
https://adobe-fonts.github.io/source-serif
SIL Open Font License 1.1
2.15k stars 161 forks source link

Request: Ŋ ŋ #64

Open AshtarBalynestjar opened 5 years ago

AshtarBalynestjar commented 5 years ago

I’d like to request adding support for the letter eng. I am aware that the letter is used in IPA and is part of the AL-5 character set, so it is already in the pipeline. However, by adding it alone, you will support several languages such as Ganda (6.5 million users), Wolof (5.2 million users) and Lakota (2100 users; would require using combining diacritics for ǧ and ȟ), and be underway to supporting languages like Northern Sami (26 thousand users; ŧ missing), Dinka (1.3 million users; ɛ ɔ ɣ missing) and Fula (29 million users; ɓ ɗ ƴ missing, as well as ɲ in some orthographies).

The relevant codepoints are:

Ŋ U+014A LATIN CAPITAL LETTER ENG ŋ U+014B LATIN SMALL LETTER ENG

The capital Eng has two main alternative glyphs: a capital N with a descender (preferred by Sami users), and an enlarged version of the lowercase eng (preferred by African users). Given all of this, the reference glyph should be the enlarged lowercase form, but most typefaces on Google Fonts seem to have it default to the capital-N form. I’m not sure if this is simply because they do not intend to support African languages, or whether there is something I’m missing, but Source Sans Pro seems to be alone among the most downloaded typefaces in defaulting to the enlarged lowercase form. In any case, both glyphs should be available, either through the locl feature or stylistic sets.

frankrolf commented 5 years ago

Thank you for this detailed request. There currently is no immediate goal to extend Source Serif to AL-5 or add all of IPA, but adding individual glyphs (especially with a strong use case) is not out of the question.

In fact, Ŋ and ŋ already exist in the Roman masters, because they were needed for Source Han Serif. However, they are not fully “wired up” – there are no variants such as small caps, there is no localized alternate, they are not in any kerning pairs, etc.

If you feel like testing the current engs, you can add two lines to the GlyphOrderAndAliasDB file found in the Roman subdirectory:

Eng Eng
eng eng

(the divider is a single tab character) If you rebuild the fonts after that via ./build.sh, the Eng/eng should be available.

I cannot comment on why the Sami Eng variant seems to be the standard (despite the presumably lower number of users). I think it might be because it’s easier to derive from capital N and J. I also don’t know why @pauldhunt chose to make the African variant the default (which is a deviation from every Adobe font before Source Sans) – which form do you think should be the standard?

AshtarBalynestjar commented 5 years ago

I also don’t know why @pauldhunt chose to make the African variant the default (which is a deviation from every Adobe font before Source Sans) – which form do you think should be the standard?

If the only addition to the current character set is eng, the only Sami language that will be fully supported would be Lule Sami (with less than 2000 speakers). However, it will be enough to support at least Wolof (5.2 million), Mandinka (1.3 million) and Ganda (6.5 million), so in my opinion the African form should be the default.

That said, it is not entirely unreasonable to ask for the Sami glyph as an alternate.

frankrolf commented 5 years ago

Thanks! Since Source Sans is already taking that route, I will implement the African variant (cap-size lowercase n form) as a default, and add the N-shaped variant via language NSM in the locl feature.

AshtarBalynestjar commented 5 years ago

Would it be a good idea to have the N-variant under ss03 as well, mirroring the way you’ve handled Bulgarian and Serbian Cyrillic?

frankrolf commented 5 years ago

I will think about it. Usually I am not in favor of using up a stylistic set for a single glyph variant, but I agree there should be a secondary way of accessing regional alternates. (ss03 is already taken in the Italics)

FloraCanou commented 5 years ago

If the only addition to the current character set is eng, the only Sami language that will be fully supported would be Lule Sami (with less than 2000 speakers). However, it will be enough to support at least Wolof (5.2 million), Mandinka (1.3 million) and Ganda (6.5 million), so in my opinion the African form should be the default.

I don't see the number of speakers to be a valid reason to pick one as default over the other.

frankrolf commented 5 years ago

@FloraCanou Why is that? Which other reasons would you suggest are valid?

FloraCanou commented 5 years ago

Unicode takes the N-with-descender form as default. I suggest following Unicode.

frankrolf commented 5 years ago

The document you linked states “glyph may also have appearance of large form of the small letter”. This example demonstrates that “following Unicode” often is ambiguous – therefore, (IMO) other practical factors (such as the number of speakers) may also be considered.

AshtarBalynestjar commented 5 years ago

I will think about it. Usually I am not in favor of using up a stylistic set for a single glyph variant, but I agree there should be a secondary way of accessing regional alternates. (ss03 is already taken in the Italics)

I think I’ve found it: the features cv01 through cv99. Here's what Microsoft Typography has to say about them:

The function of these features is similar to the function of the Stylistic Alternates feature ('salt') and the Stylistic Set features (see 'ss01' – 'ss20'). Whereas the Stylistic Set features assume recurring stylistic variations that apply to a broad set of Unicode characters, these features are intended for scenarios in which particular characters have variations not applicable to a broad set of characters. The Stylistic Alternates feature provides access to glyph variants, but does not allow an application to control these on a character-by-character basis; the Character Variant features provide the greater granularity of control.

The function of these features is also related to that of the Localized Forms ('locl') feature, in that particular variations for a character may be preferred for particular languages. In practice, though, it may not be feasible to associate particular glyph variants with particular language systems for all the relevant languages; for example, the requirements of particular languages may not be known when a font is being developed.

The distinction between these features and the Stylistic set features is most easily understood in terms of variations applying to a single character versus variations applying across a range of characters. In practice, if a variation applies to a character in a bicameral script, then the casing-pair character may have the same variation. Also, Unicode includes pre-composed characters for certain base + mark combinations, hence a single abstract character may be incorporated into a number of Unicode characters. Therefore, a variation for a particular abstract character may be applicable to several related Unicode characters. The Character Variant features can be used for sets of related characters in these cases. The key distinction between such use and the intended use for Stylistic Set features is that a Character Variant feature should apply only to one character or a set of characters closely related in this way, while Stylistic Set features are intended for broader sets of characters.

AshtarBalynestjar commented 4 years ago

Update:

The new Kazakh Latin alphabet presented by the Baitursynov Institute of Linguistics uses the letter eng, with the capital being in the N-form. Now that there is an actual compromise to be made between Kazakh and, say, Wolof and Ganda, the preferred default form isn’t so obvious. However, because Kazakhstan has a population of 18.2 million, of which over 76% are Internet users and at least 9.9 million of which actively use the Kazakh language, the use case for eng is even stronger.

moyogo commented 3 years ago

Most of the languages (Ganda, Wolof, Mandinka-Bambara-Dyoula, Dinka, or Lakota) mentioned in the original post use the n-form uppercase more often than the N-form uppercase. Many minor West African languages that do not have OpenType language system tags also mostly use the n-form.

The Sami languages that use Ŋ prefer the N-form.

It may be better to do as Source Sans, with the n-form as the default and the N-form as a locl feature for Sami languages.

andjc commented 3 years ago

Actually using locl feature is complicated, and the amount of research needed to properly implement it would be extensive. Beyond there are many languages that use Eng, one variant sued extensively in Africa (n-form with a descender, and in at least one case the n-form w/o descender). The N-form is used in Northern Europe, Northern Australia, North America, and other locations.

It is possible to use cvNN features, but there is no one source of information identifying all the languages that use Eng, so locl will be incomplete, and can't be the only method to access those variant glyphs.

frankrolf commented 3 years ago

Thanks for your thoughts, @andjc! I agree that locl tagging is a spotty way of implementing character alternates, and this concern has come up before (with Bulgarian alternates, for example). The Ŋ now added to Source Serif (with the optical size extension) is the N-like form – simply because it already was drawn for Source Han Serif. What I would like to see is a focus on extending Source Serif toward African languages – perhaps a the proper way for doing this would be creating a per-language (or language-group) forks. Not making any promises, just thinking out aloud. I think there was some interest in the past, and I think this work can start soon.

andjc commented 3 years ago

@frankrolf you will need a strategy to handle glyph variation, for wide-spread African language support you will need to cohesive approach to variant glyphs. The SIL repertoire is the most extensive, but I still find occasional gaps in their coverage.