Add CJK developer font gallery

chrissimpkins commented 8 years ago

Goal

Create a separate gallery of free fonts that support CJK character sets and are licensed for redistribution in this repository

Need

Feedback from developers who use CJK fonts on

[x] source code text specimen(s) that provide a flavor of the CJK glyphs and provide the necessary visual information to compare these fonts across character sets used in bodies of source code by developers who use CJK fonts
[x] an automated (ideally) rendering approach for the text specimens to support/facilitate contributions to the gallery via pull requests
[x] appropriate fonts for inclusion in the gallery (currently in discussion and under review)
[x] review licenses for included fonts to confirm permission to redistribute fonts
Status
[x] Create new images
[x] add fonts to repository
[x] merge into master branch
[x] link licenses
Accept Pull Requests

Yes, definitely

Please Contribute

Any and all feedback is warmly welcomed and highly encouraged. Please use this thread to discuss further.

This idea developed out of discussions in #55. This will be used as the working thread for this topic.

Current Development Gallery Page

This gallery has been released and is now available at https://github.com/chrissimpkins/codeface/blob/master/CJK.md

~~The development gallery page is located in the cjk branch here https://github.com/chrissimpkins/codeface/blob/cjk/CJK.md~~

Current Image Suggestions from Thread Discussion

The following is a current list of suggested image types for each font based upon the discussion to date in the thread:

Source code example that includes CJK glyphs in comments +/- source strings (example: https://github.com/adobe-fonts/source-han-code-jp/blob/master/resources/img-View.png)
Test pattern specimen that demonstrates ASCII + CJK glyph vertical/horizontal metrics in the fonts (example: https://github.com/adobe-fonts/source-han-code-jp/blob/master/resources/img-AA.png)
Character set coverage with the text specimens that @be5invis included in the post https://github.com/chrissimpkins/codeface/issues/114#issuecomment-165317393
Alignment test specimen as suggested by @be5invis in https://github.com/chrissimpkins/codeface/issues/114#issuecomment-165670113
Current Typeface Suggestions from Thread Discussion
Nanum Gothic Coding = Korean (SIL Open Font License, Version 1.1) [no ideographs]
D2 Coding = Korean (SIL Open Font License, Version 1.1)
Iosevka Inziu = CJK via separate fonts: J = Japanese, SC = Simplified Chinese, TC = Traditional Chinese [not sure for what language the "CL" fonts are intended; no Korean support; no license information except for a "Copyright (c) 2014-2015 M+ Fonts Project, (c) 2014-2015 Belleve Invis" copyright string]
M+ 1MN / 1M / 2M = Japanese (M+ font license)
Source Code Han JP = Japanese (SIL Open Font License, Version 1.1)
Source Han Sans = Japanese, Korean, Simplified Chinese, and Traditional Chinese (SIL OFL?)
Ricty = Japanese (SIL Open Font License, Version 1.1 & IPA Font License Agreement v1.0) Not licensed for redistribution per developer
Ricty Diminished https://github.com/yascentur/RictyDiminished
MigMix = Japanese (IPA Font License Agreement v1.0) http://mix-mplus-ipa.osdn.jp/download.html#migu1m
Migu = Japanese (IPA Font License Agreement v1.0) http://mix-mplus-ipa.osdn.jp/download.html#migu1m

chrissimpkins commented 8 years ago

@be5invis :+1: Will dig into the other specimen files this week. Not sure why they are rendering in all italics. Must be an issue with Pygments

chrissimpkins commented 8 years ago

And will remove the Adobe specific text in the vertical/horizontal metrics file as @hatchzo recommended in https://github.com/chrissimpkins/codeface/issues/114#issuecomment-167179147

kenlunde commented 8 years ago

As an aside, if Source Han Code JP were to have full CJK support, the image below (made using Source Han Sans) shows language-specific differences for zh-hant and zh-hans when the characters share the same code point (unified):

cjk-example

The last character in the zh-hant and zh-hans lines, 合, is highlighted because it is frequently used in Japanese, and its form in Source Han Sans is different in a subtle way.

kenlunde commented 8 years ago

(Never mind. I figured out the reason, which is due to the Japanese region-specific subset fonts including glyphs for these characters. I will need to explore as to why.)

chrissimpkins commented 8 years ago

@kenlunde This is a level of complexity across these languages that I do not understand. In your opinion, is this an appropriate way to represent the glyph coverage for these typefaces?

kenlunde commented 8 years ago

@chrissimpkins With very few exceptions, most CJK fonts support a single language, both in terms of character coverage and the glyphs. This would suggest that instead of using a single text example that covers CJK, you should use per-language examples, and use the appropriate one based on the primary language of the font. In the case of Source Han Code JP, only the three ja-JP lines are appropriate.

chrissimpkins commented 8 years ago

Thank you very much Ken. The font fallback option is breaking the syntax highlighter so it may work better to do this work before the automation.

Would it be possible for others who suggested typefaces to weigh in on their thoughts about this? Should we consider separate C, J, and K specimen files? If you agree, will you please let me know what support is available in the typefaces that you recommended for the repository?

On Jan 2, 2016, 12:33 PM -0500, Dr. Ken Lundenotifications@github.com, wrote:

@chrissimpkins(https://github.com/chrissimpkins)With very few exceptions, most CJK fonts support a single language, both in terms of character coverage and the glyphs. This would suggest that instead of using a single text example that covers CJK, you should use per-language examples, and use the appropriate one based on the primary language of the font. In the case of Source Han Code JP, only the three ja-JP lines are appropriate.

— Reply to this email directly orview it on GitHub(https://github.com/chrissimpkins/codeface/issues/114#issuecomment-168411205).

kenlunde commented 8 years ago

@chrissimpkins: It will be somewhat obvious when a font supports Simplified Chinese, Traditional Chinese, or Korean. Such fonts will include glyphs only for characters that are appropriate for these languages. When I read through #55, there was a note that indicated that some Chinese and Korean fonts support Japanese kana (hiragana and katakana). The reason why this is the case is because the character set standards on which they are based happen to include Japanese kana, but there are three issues with the kana support in such non-Japanese fonts: 1) the glyphs for kana are typically inferior or stolen from a Japanese font; 2) some characters that are necessary for kana support lack glyphs, because the character set standards neglected to include them; and 3) Japanese text that is kana-only is extremely rare, and you need a significant number of kanji for minimum support.

If someone can provide a convenient list of identified CJK fonts, I could take a stab at assigning a language to each of them.

chrissimpkins commented 8 years ago

@kenlunde Thanks Ken.

Here is the list that I have so far based upon recommendations in issue reports:

Links available:

Nanum Gothic Coding
D2 Coding
Iosevka Inziu
M+ 1MN / 1M / 2M (already in Codeface repository main gallery)
Source Code Han JP (Adobe, yours)
Ricty

Links currently unavailable (submitted by @mkasu) - believe that these all include Japanese glyph support:

MigMix possible link?
Migu possible link?

@mkasu would it be possible to confirm that the above links for MigMix and Migu are the correct for the font files?

chrissimpkins commented 8 years ago

We will need to review the licenses for these fonts in order to redistribute the font releases here. I will begin this process but may need help with the interpretation of some licenses that are written in the CJK languages. If you recommended a typeface and are able to interpret any corresponding license that is not in English (and provide a link to the license), it would be extremely helpful. Thanks much!

kenlunde commented 8 years ago

Nanum Gothic Coding = Korean (SIL Open Font License, Version 1.1) [no ideographs] D2 Coding = Korean (SIL Open Font License, Version 1.1) Iosevka Inziu = CJK via separate fonts: J = Japanese, SC = Simplified Chinese, TC = Traditional Chinese [not sure for what language the "CL" fonts are intended; no Korean support; no license information except for a "Copyright (c) 2014-2015 M+ Fonts Project, (c) 2014-2015 Belleve Invis" copyright string] M+ 1MN / 1M / 2M = Japanese (M+ font license) Source Code Han JP = Japanese (SIL Open Font License, Version 1.1) Ricty = Japanese (SIL Open Font License, Version 1.1 & IPA Font License Agreement v1.0) MigMix = Japanese (IPA Font License Agreement v1.0) Migu = Japanese (IPA Font License Agreement v1.0)

You may also consider adding the "HW" (half-width) Source Han Sans fonts, whose glyphs for ASCII are half-width, and could serve as coding fonts, though the monospaced 667-unit glyphs of Source Han Code JP may be more pleasant to some. There are Japanese, Korean, Simplified Chinese, and Traditional Chinese versions of these fonts. Only the Regular and Bold weights have HW fonts, and if one uses the TTCs (Font Collections), they include both HW and non-HW fonts in the case of the Regular and Bold weights.

chrissimpkins commented 8 years ago

@kenlunde Fantastic! Thank you very much Ken. I am assuming from your comments above that Source Han Sans is proportional width, not monospaced? I know that there are people out there using proportional fonts for source code. Are there any special considerations for these with CJK type in source? I am more than happy to include Source Han Sans if you feel that it is appropriate for the display of source.

chrissimpkins commented 8 years ago

All:

I will begin to create gallery images and pull the new fonts into the repository. Will post here when they are available for review.

kenlunde commented 8 years ago

@chrissimpkins Yes, the non-HW fonts in the Source Han Sans project are proportional by default, in terms of the glyphs for Latin. As long as the user expectation is for the glyphs for ASCII (U+0020 through U+007E) to be monospaced, the HW fonts in the Source Han Sans project can be used for this purpose.

chrissimpkins commented 8 years ago

@kenlunde Thanks much for the additional information

chrissimpkins commented 8 years ago

Updated contributors list to acknowledge all of your contributions to this (still in progress) effort. Thanks again to all. Look forward to this new addition to the project.

kenlunde commented 8 years ago

@chrissimpkins Although not necessarily related to this issue, you might find this particular CJK Type Blog article interesting. It was written before I added the HW fonts to the Source Han Sans project, and before @hatchzo developed the Source Han Code JP fonts. The article details how we adapted Source Sans Pro and Source Code Pro for CJK use.

chrissimpkins commented 8 years ago

@kenlunde I added it to Pocket and will have a read. I just picked up a copy of your CJKV Info Processing book as well. Time to educate myself a bit :)

Thanks again Ken.

be5invis commented 8 years ago

@chrismwendt The Inziu Iosevka's address is changed into http://be5invis.github.io/Iosevka/inziu.html . The fosshub page is no longer being updated.

chrissimpkins commented 8 years ago

@be5invis Thanks Belleve. To clarify, do these variants fall under the SIL OFL 1.1 (as per https://github.com/be5invis/Iosevka/blob/master/LICENSE)?

be5invis commented 8 years ago

@chrissimpkins Inziu Iosevka is a composite of Iosevka, M+ and SHS so yes it is.

@kenlunde CL stands for "Classical", follows the character forms in Kang-Xi Dictionary. The copyright metadata inside it is not updated for a long time, I'll update it in the next release.

chrissimpkins commented 8 years ago

@be5invis great thanks much

be5invis commented 8 years ago

@chrissimpkins I do not know whether your script support TTC collections. Inziu provides five variants within one TTC and the last one is not monospaced (the LGC part is Roboto). Maybe you can split it using some tools, like otc2otf.

chrissimpkins commented 8 years ago

@be5invis Thank you very much. I will check.

be5invis commented 8 years ago

@chrissimpkins I've tested that otc2otf can successfully split my ttcs. You can simply drop the subfonts (ttfs) not named with "Iosevka".

chrissimpkins commented 8 years ago

@be5invis perfect thanks.

For me: https://github.com/adobe-type-tools/afdko/blob/master/FDK/Tools/osx/otc2otf

be5invis commented 8 years ago

And @chrissimpkins I have to clearify what's the differences between the three lines:

(This image uses Kang-Xi Dicrionary's shape as a reference to show their difference.)

Red: Subtle shape difference. "Standard" shape of Han characters in Mainland China, Taiwan and Japan are different. Note that some differences are extremely subtle.
Green: Characters simplified in Mainland China.
Blue: Characters simplified in Japanese Shinjitai.
Purple: Difference in Chinese-Japanese four-character idioms. (結 and 合 are different characters, not two variants of one character)

chrissimpkins commented 8 years ago

@be5invis Yes, now I see. Some of the differences are incredibly subtle (e.g. the red glyphs at positions -3 in lines 1 & 2 that differ by a small side stroke position on the right side of the shape). Do the glyphs in each column have the same meaning across languages? By simplification do you mean that there has been a change in the shape of a glyph that retains the same meaning as a different version of the glyph?

be5invis commented 8 years ago

@chrissimpkins There's three levels

Character difference, only occurs in "起承転結" vs "起承轉合". This is the difference between Chinese language and Japanese language, and the meanings of "結" and "合" are not same.
Simplification, marked in blue (japanese Shinjitai) and green (Simplified Chinese). Simplification does not change the meaning of character, but the simplified form does not share same Unicode with the original one.
Shape difference. This is pretty subtle and all the variants share the same Unicode codepoint (unless you use IVS), therefore to keep the correctness you should use the font designed for the the language of the text being typesetted.

Unicode can tell the differences in (1) and (2), but not (3). You can see the image below.

be5invis commented 8 years ago

@chrissimpkins An image to explain them :)

chrissimpkins commented 8 years ago

@be5invis Thanks Belleve. That is very helpful.

This is pretty subtle and all the variants share the same Unicode codepoint (unless you use IVS). You must use the font designed for your language to typeset them correctly

Another reason to display only lines for which each typeface provides support?

be5invis commented 8 years ago

@chrissimpkins Well excluding lines is not a good idea, since programming now is international and users may read code writtern in other languages. And, considering Han characters are ideograph, displaying them in the user's local shape is acceptable.

Therefore my opinion is that keep all three sample lines, show users what the font's "designed language" is, and they can judge whether it is suitable to use it.

chrissimpkins commented 8 years ago

@be5invis so the ability to see this visual distinction in the local shape would play a role in your decision about the use of a typeface? That is interesting and seems to be an important part of the visual comparison if I understand this correctly. Do we have enough breadth across Unicode codepoint identical variants and Unicode codepoint different simplified glyphs to draw that visual distinction here?

chrissimpkins commented 8 years ago

@be5invis Also, I am assuming that a native writer would be able to draw these distinctions intuitively based upon this discussion. Is there a need for icons next to the typeface name to define the language support in the typeface?

be5invis commented 8 years ago

@chrissimpkins Variants in Han characters are extremely common, so Unicode unifies these variants into one codepoint unless they are encoded separately in some existing encoding. However all simplifications produced by China and Japan can be distinguished by Unicode. Adding an icon can be helpful.

kenlunde commented 8 years ago

My suggestion to tailor the examples on a per-language basis is based on my experience in dealing with typical users, most of which may not be sensitive to such subtle glyph differences. If you do decide to use all lines that make up the original CJK example, you need to be absolutely sure that font fallback is not kicking in, and you also need to provide some form of statement that makes it clear that some characters may not display according to language- or region-specific conventions.

Very few fonts are genuine Pan-CJK, and the number that are monospaced are even fewer. Even for Pan-CJK fonts, there will always be a default language (driven by the fact that there is a single 'cmap' table). This suggests that each font should be assigned a language, and the example should be tailored accordingly.

In the end, how you handle this is up to you. This is merely a suggestion from someone who lives and breathes this stuff on a daily basis.

be5invis commented 8 years ago

@kenlunde In my original design, the examples is used to show the "reality" of the font being tested, therefore it should be able to indicate how many characters this font cover, and what the font's target language is. Therefore I choose to keep all three lines while turning off font fallback to expose it. However since some users may not able to see the subtle differences, I think @chrissimpkins 's "add icon" is a good idea.

be5invis commented 8 years ago

For testing:

Green: Test whether the font cover Simplified Chinese-specific characters.
Blue: Test whether the font cover Japanese-specific characters.
Red: Indicate the font's target language and variant choice.

So specific to @chrissimpkins 's sample image of SHC:

There are three missing characters, and all of them are Simplified Chinese-specific. It means that this font does not suppport Simplified Chinese.
All characters are in the Japanese variant, indicates this font is designed for Japanese.
There's no blank in the zh-hant line means that it can be used to typeset Traditional Chinese, but the shape may be incorrect.
The Kana lines and Hangul lines are complete, means that this font support Kana and Hangul.
The first line (12345678-...-12345678) is used to indicate the width of Latin characters. I see it is "2/3-width", because the 4th Han character (月)'s right side is aligned to "6" in the first line.

chrissimpkins commented 8 years ago

@kenlunde @be5invis

Maybe we should move away from an automated approach in the CJK gallery and create the images manually given the complexities in the information that we are trying to convey here. We could consider a compromise in which we use:

black on light + colored glyph scheme (as @be5invis used above) for the lines where support is included based upon @kenlunde's review of the fonts
light grey glyphs across the entire line where the face lacks complete language support

The colors would emphasize the supported sets and de-emphasize sets with a lack of support, yet allow users to peruse the glyphs from the sets without complete support. We can either add icons next to the font names or create header categories to facilitate simple categorization on initial view of the gallery. This will let them hone in on the sets of interest. Unless you are aware of other typefaces out there, we are only dealing with a handful of fonts here so this should not be a problem.

To do this, we would need to identify an editor that supports font fallback controls and manual glyph coloring. Thoughts about the approach and an application that would handle this?

We will still need to address this point by @kenlunde :

provide some form of statement that makes it clear that some characters may not display according to language- or region-specific conventions

be5invis commented 8 years ago

@chrissimpkins I think this can be automated if you add a "language" metadata when producing the sample image. However for Traditional Chinese, there are some fonts follow "Orthodox" Kang-Xi shapes instead of Taiwan Standard Form of National Characters (國字標准字體), that would be a problem. So the languages are:

Simplified Chinese : standarized by Table of General Standarized Chinese Characters (通用规范汉字表)
Japanese : standarized by Jōyō kanji (改定常用漢字表)
Traditional Chinese - Taiwan : standarized by Standard Form of National Characters (國字標准字體)
Traditional Chinese - Hong Kong : standarized by List of Graphemes of Commonly-used Chinese Characters (常用字字形表), similar to Taiwan standard, rare.
Traditional Chinese - Orthodox : The common form of Chinese characters before WWII, including Japanese Kyuujitai, Kang-Xi forms, etc.
Korean : similar to Orthodox, stanarized by Basic hanja for educational use (한문교육용기초한자, 漢文敎育用基礎漢字)

When a language is chosen you can highlight the "improper" characters using this image:

So if the target language of the font being tested is provided, you can simply highlight these Han characters using the image above (for example, SHC should use group 4 to highlight characters in zh-hant and zh-hans lines). If the language is not provided, users can still inform the font's target language from the image rendered.

kenlunde commented 8 years ago

@be5invis Highlighting "improper" ideograph forms is problematic, and is likely to confuse more than it will help. It is far easier to label each font by its language, then supply a sample string that is specifically tailored for that language. Even for Pan-CJK fonts, there is always a singular primary language for each font, and the only way to access glyphs for other languages or regions is via the 'locl' GSUB feature, which requires that the text be properly language-tagged and that the application supports that feature.

I stated that such highlighting is problematic, because sometimes the difference is due to the typeface design or typeface style (serif versus sans serif versus script). In the case of 李 and 合, it is arguably a typeface design difference in the sample strings. Source Han Sans exhibits a difference for these characters, but other typeface designs may not.

be5invis commented 8 years ago

@kenlunde Hmmm, so is this solution acceptable?

Highlight the line of the language the font designed for;
And keep other lines to expose the font's coverage and variant selection?

For SHC, the ja-JP line is highlighted, while the hans and hant lines are faded.

kenlunde commented 8 years ago

@be5invis As long as the entire supported line is highlighted, and entire unsupported lines are faded, that is a better solution. Actually, if the unsupported lines are faded, the supported line can simply be rendered in black.

be5invis commented 8 years ago

@kenlunde Yeah, the "target" lines are black while unsupported lines are faded. @chrissimpkins I think this solution is acceptable.

chrissimpkins commented 8 years ago

@be5invis @kenlunde

That sounds good. This will be much easier to maintain than extensively highlighted options.

Have we removed the second of the Japanese Kana lines in https://github.com/chrissimpkins/codeface/issues/114#issuecomment-168395997 ? If not, can you let me know if both lines should be considered supported by all of the fonts labeled as having Japanese language support in the first post of this thread? This information is from @kenlunde's review of all fonts that have been recommended to date.

Thanks much.

chrissimpkins commented 8 years ago

Can anyone suggest an application that supports settings to disable font fallbacks?

be5invis commented 8 years ago

@chrissimpkins The font-fallback behavior is strongly platform specific. I think improving your existing rendering script to support "fading" is better than taking a snapshot from some editor. As for the Kana lines, both of them should be preserved (and not being faded), because both Hirakana dna Katakana are used in Japanese, and are included in the national standard encodings in Japan, Mainland China, Taiwan, Hong Kong and Korea.

chrissimpkins commented 8 years ago

@be5invis Thanks. Will look around to see what I can find. The documentation of the Python Pango/Cairo libraries is pretty poor and my manipulation of the attributes to remove font fallbacks broke the syntax highlighter (Pygments). It is not a situation where it is saving me time if there is a simple application where I can do this and take screenshots.

@kenlunde Ken, does In Design on OS X support font fallback settings (or ideally Photoshop - not good for layouts but which I already have)?

kenlunde commented 8 years ago

@be5invis Although Chinese and Korean fonts almost always include glyphs for kana, because the national standards on which they're based includes them, I would strongly advise against using Chinese or Korean fonts for displaying kana. I explained me reasons in a different post to this issue. I suggest removing the "Kana" label, and simply tag those lines as ja-JP. In other words, the Japanese sample should consist of three lines.

@chrissimpkins Because InDesign is a high-end authoring app, which intended to give maximum control to the user, if the selected font does not have a glyph for a character, a pink-colored .notdef glyph is displayed.

chrissimpkins commented 8 years ago

removing the "Kana" label, and simply tag those lines as ja-JP

got it.

a pink-colored .notdef glyph is displayed

excellent, thank you very much. Will explore

chrissimpkins / codeface