chrissimpkins / codeface

Typefaces for source code beautification
Other
6.2k stars 414 forks source link

Add CJK developer font gallery #114

Closed chrissimpkins closed 8 years ago

chrissimpkins commented 8 years ago

Goal

Create a separate gallery of free fonts that support CJK character sets and are licensed for redistribution in this repository

Need

Feedback from developers who use CJK fonts on

Yes, definitely

Please Contribute

Any and all feedback is warmly welcomed and highly encouraged. Please use this thread to discuss further.

This idea developed out of discussions in #55. This will be used as the working thread for this topic.

Current Development Gallery Page

This gallery has been released and is now available at https://github.com/chrissimpkins/codeface/blob/master/CJK.md

The development gallery page is located in the cjk branch here https://github.com/chrissimpkins/codeface/blob/cjk/CJK.md

Current Image Suggestions from Thread Discussion

The following is a current list of suggested image types for each font based upon the discussion to date in the thread:

be5invis commented 8 years ago

Some sample text may be useful.

花鳥風月 春夏秋冬 生老病死 喜怒哀樂 櫻梅桃李 起承轉合  (zh-hant)
花鸟风月 春夏秋冬 生老病死 喜怒哀乐 樱梅桃李 起承转合  (zh-hans)
花鳥風月 春夏秋冬 生老病死 喜怒哀楽 桜梅桃李 起承転結  (ja-JP)
いろはにほへど ちりぬるを わがよたれぞ つねならむ   (ja-JP, Kana)
ウヰノオクヤマ ケフコエテ アサキユメミジ ヱヒモセズ (ja-JP, Kana)
챠트 피면 술컵도 유효작                                (ko-KR)

A sample render may look like this: image (Inziu Iosevka SC, 24ppem, hinted) image (Inziu Iosevka J, 24ppem, hinted)

chrissimpkins commented 8 years ago

Thoughts about how best to display the CJK text? Simple black on white for those strings that you provided along with the syntax highlighted code blocks from the main gallery? Automated rendering approach?

mkasu commented 8 years ago

I'd like an automated rendering approach.

I looked at some other projects regarding this topic.

I really like this picture from Adobe, as it shows a good distribution of latin characters, special characters as well as Japanese comments. https://github.com/adobe-fonts/source-han-code-jp/blob/master/resources/img-View.png

Another interesting approach is this. The idea is to visualise the ratio between Japanese characters and monospaced latin characters. Even in Japanese "programming fonts", the ratio can often make it difficult to align things like ascii tables or other layouts properly (If you use these fonts in terminals, some layout heavy applications can get misaligned pretty quickly when using Japanese fonts). https://github.com/adobe-fonts/source-han-code-jp/blob/master/resources/img-AA.png

I couldn't find anything about licensing of these example texts though.

chrissimpkins commented 8 years ago

@mkasu:

I'd like an automated rendering approach.

Any suggestions?

as it shows a good distribution of latin characters, special characters as well as Japanese comments

Looks good.

visualise the ratio between Japanese characters and monospaced latin characters

As does this.

I couldn't find anything about licensing of these example texts though.

I asked in a new issue report on their repository https://github.com/adobe-fonts/source-han-code-jp/issues/9 . Let's wait for their response.

chrissimpkins commented 8 years ago

@be5invis Are the strings that you suggested above simple sentences / word combinations, or do they provide some additional information for individuals who are comparing the typefaces? I am interested in the visual information that CJK font users might need in these text specimens in order to compare typefaces in the gallery. Do you have any suggestions beyond those noted by @mkasu above? Any thoughts about common glyph patterns that might bring out issues such as vertical and horizontal metrics that could be important for the use of CJK fonts in source?

chrissimpkins commented 8 years ago

A Pango/Cairo approach to automated text specimen creation was submitted in this (now closed) PR https://github.com/chrissimpkins/codeface/pull/49

It didn't render clear images on OS X, but perhaps we could give it another shot on Linux to see if the FreeType renderer leads to better images.

be5invis commented 8 years ago

@chrissimpkins The three Han lines are four-character idioms in the East Asian cultural sphere, and they can test the font's character coverage and character shapes. Since China mainland, Taiwan and Japan use different characters and character shapes, this test will indicate whether the font shown is suitable for the users in these regions. Kana and Hangul lines are just pangrams.

chrissimpkins commented 8 years ago

@be5invis This is very helpful and it sounds like this is necessary information that will need to be conveyed in at least one of the images.

chrissimpkins commented 8 years ago

As of now, we have the following image suggestions:

be5invis commented 8 years ago

And @chrissimpkins, maybe you can add an alignment test, since many fonts may be half-width (like Iosevka) of 2/3-width (like Source Han Code JP), the sample can be this:

12345678-12345678-12345678-12345678-12345678-12345678
花鳥風月 春夏秋冬 生老病死 喜怒哀樂 櫻梅桃李 起承轉合  (zh-hant)
花鸟风月 春夏秋冬 生老病死 喜怒哀乐 樱梅桃李 起承转合  (zh-hans)
花鳥風月 春夏秋冬 生老病死 喜怒哀楽 桜梅桃李 起承転結  (ja-JP)
いろはにほへど ちりぬるを わがよたれぞ つねならむ   (ja-JP, Kana)
ウヰノオクヤマ ケフコエテ アサキユメミジ ヱヒモセズ (ja-JP, Kana)
챠트 피면 술컵도 유효작                                (ko-KR)

A half-width font may look like this: image

chrissimpkins commented 8 years ago

@be5invis Can you provide more information about this metric? CJK fonts are new to me and I have not come across this terminology.

be5invis commented 8 years ago

Half-width means that every Latin characters are exactly 1/2em wide. Source Han Code JP is an unusual one, which its latin characters are 2/3em wide.

chrissimpkins commented 8 years ago

Half-width means that every Latin characters are exactly 1/2em wide. Source Han Code JP is an unusual one, which its latin characters are 2/3em wide.

I see. Thank you very much.

chrissimpkins commented 8 years ago

So we are currently at the following list of suggested images for these fonts:

mkasu commented 8 years ago

@chrissimpkins The 2nd and 4th point is more or less the same test, or can at least be merged as alignment is directly related to vertical/horizontal metrics.

Edit: It's really a question whether it's half width, 2/3-width or something even less common. Vertical/horizontal metrics are a direct result as the special characters like | < > etc usually have the same width ratio to CJK characters as latin characters have. Therefore, a different behaviours in 2nd test is really a result of the width, as in 4th test.

chrissimpkins commented 8 years ago

It sounds as though we are reaching consensus about this. I will update the proposed image types over the weekend based upon this conversation (and any other comments that are posted between now and then) and then we can dig in on the actual text specimens that we want to use. The group over at Adobe added text samples to their repository for the images that we discussed above and applied the MIT license to them (discussion in https://github.com/adobe-fonts/source-han-code-jp/issues/9). Big thanks to @kenlunde and @hatchzo for all of their help.

Let’s begin to discuss how best to render and present these images. Anyone have experience with Pango and Cairo? As noted above, I tried this approach for the main gallery images and it didn’t work out on OS X. The images were not clear and would not be appropriate for display using the approach that I attempted. Perhaps we transition to Linux and try to take advantage of the FreeType renderer? We will not achieve the exact appearance that every user will see on their screen. The goal here is to provide an overview for further exploration on one’s own system.

kenlunde commented 8 years ago

Our pleasure.

chrissimpkins commented 8 years ago

Still investigating automated approaches to image generation. Let me know if you have any thoughts / experience with this. Will update here when I have more information. Still intend to move forward with this even if it requires manual screenshots as with the main gallery.

chrissimpkins commented 8 years ago

Also, I intend to include all of the recommended fonts in the #55 thread. I will create new issue reports for each of these. Are there others that you recommend? Please add new issue reports for any others that you feel are appropriate for the gallery page.

chrissimpkins commented 8 years ago

Using a Pango/Cairo/Pygments approach, here is where we are with the dark on light text specimen images:

cjk-specimen-light cjk-aa

The Pygments syntax highlighter is not liking the text that we are throwing at it here. It is rendering the codeblocks (and ASCII text above) in all italics for some reason.

test

And here is where we would be with a manual screenshot approach that uses the same syntax highlighter that I am currently using on the main gallery page:

manual-syntax

Thoughts?

chrissimpkins commented 8 years ago

@Snack-X you commented in the original thread where we discussed this. Have any feedback about the proposed images above?

hatchzo commented 8 years ago

Hi,

I use these lines, but it might be unnecessary for your purpose.

// ┌────────────────────────┐ // │ https://github.com/adobe-fonts │ // │ アドビのオープンソースフォントサイト │ // └────────────────────────┘

/*** 既存のファイルを開くためのスクリプト

作成者:折原 大
作成日:2014/05/17

***/

And, these lines is not aligned correctly on the first image.

// // a-----:---------d : a-b-c-d viewBounds // // | e-:-----h | : e-f-g-h homeBounds // // | | : | | : i-j-k-l pageBounds // // | | : 文字 |  |  : // // | | : 画像 |  |  : // // | | : | | : // // | f-:-----g | : // // b-----:---------c :

Please make sure of that.

Best Regards,

— Masataka Hattori

2015/12/24 15:05、Chris Simpkins notifications@github.com<mailto:notifications@github.com> のメール:

@Snack-Xhttps://github.com/Snack-X you commented in the original thread where we discussed this. Have any feedback about the proposed images above?

— Reply to this email directly or view it on GitHubhttps://github.com/chrissimpkins/codeface/issues/114#issuecomment-167047617.

be5invis commented 8 years ago

@chrissimpkins I found that in your sample image characters not included in the font (like 乐 and 转) are fall-backed into system font. However in order to show the coverage clearly, these absent characters should be shown in blank or "口"s. image

kenlunde commented 8 years ago

@be5invis: Source Han Code JP is a Japanese-only typeface family that is a derivative of the Source Han Sans typeface family, which is Pan-CJK. It would be inappropriate to use Source Han Code JP to render characters in a non-Japanese context, such as the Simplified Chinese examples that are resorting to font fallback.

be5invis commented 8 years ago

@kenlunde Yeah, so my advice is that leave all unsupported characters (like 乐 here) blank, to show the character coverage of the fonts clearly, since many Japanese fonts do not cover Simplified and Traditioanl Chinese characters.

kenlunde commented 8 years ago

That would suggest nuking the zh-hant, zh-hans, and ko-KR lines from the sample text, because they should not be rendered with a Japanese font. All other lines with CJK text are Japanese, and Source Han Code JP is an appropriate font.

The reasoning behind the suggestion is that there are two things going on with Chinese. First, Source Han Code JP lacks glyphs for some characters, because they are specific to Simplified Chinese. You noticed those because font fallback makes this obvious. Second, which is more subtle, some characters are not rendered correctly for Chinese, in terms of one or more components. They are displaying using the Japanese forms, and would appear add to Chinese folks.

be5invis commented 8 years ago

@kenlunde Maybe add comments after the language tag indicating that the font does not support this language? An exmaple for Source Code JP:

12345678-12345678-12345678-12345678-12345678-12345678
花鳥風月 春夏秋冬 生老病死 喜怒哀樂 櫻梅桃李 起承轉合  (zh-hant)     // Incomplete support
花鸟风月 春夏秋冬 生老病死 喜怒哀乐 樱梅桃李 起承转合  (zh-hans)     // Incomplete support
花鳥風月 春夏秋冬 生老病死 喜怒哀楽 桜梅桃李 起承転結  (ja-JP)
いろはにほへど ちりぬるを わがよたれぞ つねならむ   (ja-JP, Kana)
ウヰノオクヤマ ケフコエテ アサキユメミジ ヱヒモセズ (ja-JP, Kana)
챠트 피면 술컵도 유효작                                (ko-KR)       // Unsupported

And an exmaple for Inziu Iosevka SC:

12345678-12345678-12345678-12345678-12345678-12345678
花鳥風月 春夏秋冬 生老病死 喜怒哀樂 櫻梅桃李 起承轉合  (zh-hant)
花鸟风月 春夏秋冬 生老病死 喜怒哀乐 樱梅桃李 起承转合  (zh-hans)
花鳥風月 春夏秋冬 生老病死 喜怒哀楽 桜梅桃李 起承転結  (ja-JP)
いろはにほへど ちりぬるを わがよたれぞ つねならむ   (ja-JP, Kana)
ウヰノオクヤマ ケフコエテ アサキユメミジ ヱヒモセズ (ja-JP, Kana)
챠트 피면 술컵도 유효작                                (ko-KR)       // Unsupported
kenlunde commented 8 years ago

That could work, but it's really a two-prong issue, as my edited comment above points out. For ko-KR, it is a matter of incompleteness, for zh-hant, it is a matter of appropriateness (some glyphs are present in the font, but not appropriate for Traditional Chinese), and for zh-hans it is both (some characters have no glyph, and font fallback kicks in, and glyphs for some characters are not appropriate for Simplified Chinese).

be5invis commented 8 years ago

@kenlunde In my original thought about these samples, I'd like to show out how chracters in various languages and scripts are displayed, and let users to determine whether this font is appropriate to use. So:

kenlunde commented 8 years ago

That can work, but keep in mind that for some users, "shape appropriateness" can be somewhat subjective. In other words, some users may not be able to recognize such distinctions. But, for now, your plan should work.

be5invis commented 8 years ago

@kenlunde Thanks. Maybe @chrissimpkins can increase the text size in this sample to help users to indicate the shapes. btw. the latest Inziu Iosevka is listed here, as a composite of Iosevka, M+ and Source Han Sans.

hatchzo commented 8 years ago

Source Han Sans supports these variant "花" (flower) glyphs for each language. However Source Han Code JP only has Japanese character mapping.

2015-12-25 12 08 56

chrissimpkins commented 8 years ago

And, these lines is not aligned correctly on the first image.

@hatchzo Thank you!

chrissimpkins commented 8 years ago

However in order to show the coverage clearly, these absent characters should be shown in blank or "口"s.

@be5invis It seems that Pango/Cairo falls back on a system default then? This was not apparent to me. Thanks for pointing it out. Perhaps we could discuss the coverage in each of the recommended fonts that all here suggested and create approporiate specimen sheets for each?

chrissimpkins commented 8 years ago

@be5invis @kenlunde All of these should support the .notdef glyph; however, many renderers (including those in most text editors and apparently in the Pango/Cairo approach used here) override this with the fallbacks so that the appropriate glyph is shown. We may need to manually investigate the code coverage and create appropriate specimen sheets to demonstrate included glyphs with text that indicates when support is not available.

Thoughts? If you know of an automated renderer that supports the undefined glyph in the render, I am open to suggestions.

chrissimpkins commented 8 years ago

increase the text size in this sample to help users to indicate the shapes

@be5invis what size do you suggest?

chrissimpkins commented 8 years ago

Would it be helpful to create a separate CJK text specimen repository that we could all collaboratively work on to create some sort of standard for the display of fonts aimed at developers? I think that the discussion above suggests that this is warranted and I suspect that there may be a great deal more feedback out there about how to approach this issue.

chrissimpkins commented 8 years ago

I started a new repository in the Source Foundry organization account where I will be hosting ASCII flavored source code text specimens that we use on the main Codeface gallery page. There have been suggestions (including some by @be5invis) for modifications to these images and we will work on them there in an attempt to achieve some level of consensus about what is helpful to view when one makes a decision about an appropriate developer typeface.

https://github.com/source-foundry/text-specimens

Shall we create a CJK sub-directory and work there on the specimens discussed in this thread? We can take advantage of the Github workflow and use pull requests for the suggestions above. If you are in favor of this, I will move the current files to that repository and we can begin to work on changes there.

kenlunde commented 8 years ago

I have been stating for years that font fallback is a double-edge sword. One edge, the one that is useful for cutting, is useful in that characters that lack a glyph in the selected font have a chance of being displayed with a meaningful glyph. The other edge, the one that can come back to bite you, is not knowing from which font the glyphs are being displayed. Sometimes, it is obvious when font fallback is being used, due to clear differences in typeface style or weight, but sometimes the effect can be very subtle.

As long as each sample is associated with a particular language, if a font that supports that language is used, the probability of font fallback kicking in is significantly minimized.

chrissimpkins commented 8 years ago

@kenlunde is there any tool to investigate CJK character code point coverage outside of a font editor with manual review?

We could indicate coverage with icons or text. Icons adjacent to the font name in the gallery might address this in a simple, clear, language-independent fashion. Then we will need to decide what to do with the images that span the spectrum of languages that were proposed above.

be5invis commented 8 years ago

There's an attribute in Pango can disable fallback: https://developer.gnome.org/pango/stable/pango-Text-Attributes.html#pango-attr-fallback-new This may be helpful.

chrissimpkins commented 8 years ago

@be5invis Thank you very much! I have been digging into the Pango/Cairo docs and the Python binding docs (not well documented unfortunately) and came up with this so far:

https://github.com/chrissimpkins/codeface/blob/cjk/scripts/render.py#L128-L132

It executes the script without issues but still renders the missing glyphs with a fallback font. Will do some more investigation this week to see if I can figure out why this is the case.

kenlunde commented 8 years ago

@chrissimpkins: I use the AFDKO spot tool to dump the Format 12 (UTF-32) 'cmap' subtable. If that subtable does not exist, which is when all mappings are in the BMP, then I dump the Format 4 (BMP-only UTF-16) subtable. Checking these 'cmap' subtables represents the most reliable way to check Unicode coverage. The flags in the 'OS/2' table are extremely unreliable, and some tools set them via heuristics.

chrissimpkins commented 8 years ago

@kenlunde That is extremely helpful Ken. Thank you very much!

chrissimpkins commented 8 years ago

@hatchzo @kenlunde @be5invis Does this appear to be the correct rendering of the available glyph sets in Source Han Code JP using the recommended specimen? This is what I am getting with the Pango/Cairo render when I switch the fallback font setting to False:

cjktest

Four glyphs show as missing, one of which is in the Japanese set...

be5invis commented 8 years ago

@chrissimpkins So where's the source code of the sample text? The [x] looks like a "null" character or a zero-width space. Maybe you can replace the tabs into four spaces to avoid this problem.

chrissimpkins commented 8 years ago

https://github.com/chrissimpkins/codeface/blob/cjk/samplecode/cjk-specimen.txt

be5invis commented 8 years ago

@chrissimpkins Have you tried to replace tabs into spaces?

chrissimpkins commented 8 years ago

@be5invis Yes, you appear to be correct. Here is where we are with all tabs in the specimen converted to spaces:

cjktest

Is this what should be expected?

be5invis commented 8 years ago

@chrissimpkins Exactly what it should be.