Closed chrissimpkins closed 8 years ago
@be5invis :+1: Will dig into the other specimen files this week. Not sure why they are rendering in all italics. Must be an issue with Pygments
And will remove the Adobe specific text in the vertical/horizontal metrics file as @hatchzo recommended in https://github.com/chrissimpkins/codeface/issues/114#issuecomment-167179147
As an aside, if Source Han Code JP were to have full CJK support, the image below (made using Source Han Sans) shows language-specific differences for zh-hant and zh-hans when the characters share the same code point (unified):
The last character in the zh-hant and zh-hans lines, 合, is highlighted because it is frequently used in Japanese, and its form in Source Han Sans is different in a subtle way.
(Never mind. I figured out the reason, which is due to the Japanese region-specific subset fonts including glyphs for these characters. I will need to explore as to why.)
@kenlunde This is a level of complexity across these languages that I do not understand. In your opinion, is this an appropriate way to represent the glyph coverage for these typefaces?
@chrissimpkins With very few exceptions, most CJK fonts support a single language, both in terms of character coverage and the glyphs. This would suggest that instead of using a single text example that covers CJK, you should use per-language examples, and use the appropriate one based on the primary language of the font. In the case of Source Han Code JP, only the three ja-JP lines are appropriate.
Thank you very much Ken. The font fallback option is breaking the syntax highlighter so it may work better to do this work before the automation.
Would it be possible for others who suggested typefaces to weigh in on their thoughts about this? Should we consider separate C, J, and K specimen files? If you agree, will you please let me know what support is available in the typefaces that you recommended for the repository?
On Jan 2, 2016, 12:33 PM -0500, Dr. Ken Lundenotifications@github.com, wrote:
@chrissimpkins(https://github.com/chrissimpkins)With very few exceptions, most CJK fonts support a single language, both in terms of character coverage and the glyphs. This would suggest that instead of using a single text example that covers CJK, you should use per-language examples, and use the appropriate one based on the primary language of the font. In the case of Source Han Code JP, only the three ja-JP lines are appropriate.
— Reply to this email directly orview it on GitHub(https://github.com/chrissimpkins/codeface/issues/114#issuecomment-168411205).
@chrissimpkins: It will be somewhat obvious when a font supports Simplified Chinese, Traditional Chinese, or Korean. Such fonts will include glyphs only for characters that are appropriate for these languages. When I read through #55, there was a note that indicated that some Chinese and Korean fonts support Japanese kana (hiragana and katakana). The reason why this is the case is because the character set standards on which they are based happen to include Japanese kana, but there are three issues with the kana support in such non-Japanese fonts: 1) the glyphs for kana are typically inferior or stolen from a Japanese font; 2) some characters that are necessary for kana support lack glyphs, because the character set standards neglected to include them; and 3) Japanese text that is kana-only is extremely rare, and you need a significant number of kanji for minimum support.
If someone can provide a convenient list of identified CJK fonts, I could take a stab at assigning a language to each of them.
@kenlunde Thanks Ken.
Here is the list that I have so far based upon recommendations in issue reports:
Links available:
Links currently unavailable (submitted by @mkasu) - believe that these all include Japanese glyph support:
@mkasu would it be possible to confirm that the above links for MigMix and Migu are the correct for the font files?
We will need to review the licenses for these fonts in order to redistribute the font releases here. I will begin this process but may need help with the interpretation of some licenses that are written in the CJK languages. If you recommended a typeface and are able to interpret any corresponding license that is not in English (and provide a link to the license), it would be extremely helpful. Thanks much!
Nanum Gothic Coding = Korean (SIL Open Font License, Version 1.1) [no ideographs] D2 Coding = Korean (SIL Open Font License, Version 1.1) Iosevka Inziu = CJK via separate fonts: J = Japanese, SC = Simplified Chinese, TC = Traditional Chinese [not sure for what language the "CL" fonts are intended; no Korean support; no license information except for a "Copyright (c) 2014-2015 M+ Fonts Project, (c) 2014-2015 Belleve Invis" copyright string] M+ 1MN / 1M / 2M = Japanese (M+ font license) Source Code Han JP = Japanese (SIL Open Font License, Version 1.1) Ricty = Japanese (SIL Open Font License, Version 1.1 & IPA Font License Agreement v1.0) MigMix = Japanese (IPA Font License Agreement v1.0) Migu = Japanese (IPA Font License Agreement v1.0)
You may also consider adding the "HW" (half-width) Source Han Sans fonts, whose glyphs for ASCII are half-width, and could serve as coding fonts, though the monospaced 667-unit glyphs of Source Han Code JP may be more pleasant to some. There are Japanese, Korean, Simplified Chinese, and Traditional Chinese versions of these fonts. Only the Regular and Bold weights have HW fonts, and if one uses the TTCs (Font Collections), they include both HW and non-HW fonts in the case of the Regular and Bold weights.
@kenlunde Fantastic! Thank you very much Ken. I am assuming from your comments above that Source Han Sans is proportional width, not monospaced? I know that there are people out there using proportional fonts for source code. Are there any special considerations for these with CJK type in source? I am more than happy to include Source Han Sans if you feel that it is appropriate for the display of source.
All:
I will begin to create gallery images and pull the new fonts into the repository. Will post here when they are available for review.
@chrissimpkins Yes, the non-HW fonts in the Source Han Sans project are proportional by default, in terms of the glyphs for Latin. As long as the user expectation is for the glyphs for ASCII (U+0020 through U+007E) to be monospaced, the HW fonts in the Source Han Sans project can be used for this purpose.
@kenlunde Thanks much for the additional information
Updated contributors list to acknowledge all of your contributions to this (still in progress) effort. Thanks again to all. Look forward to this new addition to the project.
@chrissimpkins Although not necessarily related to this issue, you might find this particular CJK Type Blog article interesting. It was written before I added the HW fonts to the Source Han Sans project, and before @hatchzo developed the Source Han Code JP fonts. The article details how we adapted Source Sans Pro and Source Code Pro for CJK use.
@kenlunde I added it to Pocket and will have a read. I just picked up a copy of your CJKV Info Processing book as well. Time to educate myself a bit :)
Thanks again Ken.
@chrismwendt The Inziu Iosevka's address is changed into http://be5invis.github.io/Iosevka/inziu.html . The fosshub page is no longer being updated.
@be5invis Thanks Belleve. To clarify, do these variants fall under the SIL OFL 1.1 (as per https://github.com/be5invis/Iosevka/blob/master/LICENSE)?
@chrissimpkins Inziu Iosevka is a composite of Iosevka, M+ and SHS so yes it is.
@kenlunde CL stands for "Classical", follows the character forms in Kang-Xi Dictionary. The copyright metadata inside it is not updated for a long time, I'll update it in the next release.
@be5invis great thanks much
@chrissimpkins I do not know whether your script support TTC collections. Inziu provides five variants within one TTC and the last one is not monospaced (the LGC part is Roboto). Maybe you can split it using some tools, like otc2otf
.
@be5invis Thank you very much. I will check.
@chrissimpkins I've tested that otc2otf
can successfully split my ttc
s. You can simply drop the subfonts (ttf
s) not named with "Iosevka".
@be5invis perfect thanks.
For me: https://github.com/adobe-type-tools/afdko/blob/master/FDK/Tools/osx/otc2otf
And @chrissimpkins I have to clearify what's the differences between the three lines:
(This image uses Kang-Xi Dicrionary's shape as a reference to show their difference.)
@be5invis Yes, now I see. Some of the differences are incredibly subtle (e.g. the red glyphs at positions -3 in lines 1 & 2 that differ by a small side stroke position on the right side of the shape). Do the glyphs in each column have the same meaning across languages? By simplification do you mean that there has been a change in the shape of a glyph that retains the same meaning as a different version of the glyph?
@chrissimpkins There's three levels
Unicode can tell the differences in (1) and (2), but not (3). You can see the image below.
@chrissimpkins An image to explain them :)
@be5invis Thanks Belleve. That is very helpful.
This is pretty subtle and all the variants share the same Unicode codepoint (unless you use IVS). You must use the font designed for your language to typeset them correctly
Another reason to display only lines for which each typeface provides support?
@chrissimpkins Well excluding lines is not a good idea, since programming now is international and users may read code writtern in other languages. And, considering Han characters are ideograph, displaying them in the user's local shape is acceptable.
Therefore my opinion is that keep all three sample lines, show users what the font's "designed language" is, and they can judge whether it is suitable to use it.
@be5invis so the ability to see this visual distinction in the local shape would play a role in your decision about the use of a typeface? That is interesting and seems to be an important part of the visual comparison if I understand this correctly. Do we have enough breadth across Unicode codepoint identical variants and Unicode codepoint different simplified glyphs to draw that visual distinction here?
@be5invis Also, I am assuming that a native writer would be able to draw these distinctions intuitively based upon this discussion. Is there a need for icons next to the typeface name to define the language support in the typeface?
@chrissimpkins Variants in Han characters are extremely common, so Unicode unifies these variants into one codepoint unless they are encoded separately in some existing encoding. However all simplifications produced by China and Japan can be distinguished by Unicode. Adding an icon can be helpful.
My suggestion to tailor the examples on a per-language basis is based on my experience in dealing with typical users, most of which may not be sensitive to such subtle glyph differences. If you do decide to use all lines that make up the original CJK example, you need to be absolutely sure that font fallback is not kicking in, and you also need to provide some form of statement that makes it clear that some characters may not display according to language- or region-specific conventions.
Very few fonts are genuine Pan-CJK, and the number that are monospaced are even fewer. Even for Pan-CJK fonts, there will always be a default language (driven by the fact that there is a single 'cmap' table). This suggests that each font should be assigned a language, and the example should be tailored accordingly.
In the end, how you handle this is up to you. This is merely a suggestion from someone who lives and breathes this stuff on a daily basis.
@kenlunde In my original design, the examples is used to show the "reality" of the font being tested, therefore it should be able to indicate how many characters this font cover, and what the font's target language is. Therefore I choose to keep all three lines while turning off font fallback to expose it. However since some users may not able to see the subtle differences, I think @chrissimpkins 's "add icon" is a good idea.
For testing:
So specific to @chrissimpkins 's sample image of SHC:
zh-hant
line means that it can be used to typeset Traditional Chinese, but the shape may be incorrect.12345678-...-12345678
) is used to indicate the width of Latin characters. I see it is "2/3-width", because the 4th Han character (月)'s right side is aligned to "6" in the first line.@kenlunde @be5invis
Maybe we should move away from an automated approach in the CJK gallery and create the images manually given the complexities in the information that we are trying to convey here. We could consider a compromise in which we use:
The colors would emphasize the supported sets and de-emphasize sets with a lack of support, yet allow users to peruse the glyphs from the sets without complete support. We can either add icons next to the font names or create header categories to facilitate simple categorization on initial view of the gallery. This will let them hone in on the sets of interest. Unless you are aware of other typefaces out there, we are only dealing with a handful of fonts here so this should not be a problem.
To do this, we would need to identify an editor that supports font fallback controls and manual glyph coloring. Thoughts about the approach and an application that would handle this?
We will still need to address this point by @kenlunde :
provide some form of statement that makes it clear that some characters may not display according to language- or region-specific conventions
@chrissimpkins I think this can be automated if you add a "language" metadata when producing the sample image. However for Traditional Chinese, there are some fonts follow "Orthodox" Kang-Xi shapes instead of Taiwan Standard Form of National Characters (國字標准字體), that would be a problem. So the languages are:
When a language is chosen you can highlight the "improper" characters using this image:
So if the target language of the font being tested is provided, you can simply highlight these Han characters using the image above (for example, SHC should use group 4 to highlight characters in zh-hant
and zh-hans
lines). If the language is not provided, users can still inform the font's target language from the image rendered.
@be5invis Highlighting "improper" ideograph forms is problematic, and is likely to confuse more than it will help. It is far easier to label each font by its language, then supply a sample string that is specifically tailored for that language. Even for Pan-CJK fonts, there is always a singular primary language for each font, and the only way to access glyphs for other languages or regions is via the 'locl' GSUB feature, which requires that the text be properly language-tagged and that the application supports that feature.
I stated that such highlighting is problematic, because sometimes the difference is due to the typeface design or typeface style (serif versus sans serif versus script). In the case of 李 and 合, it is arguably a typeface design difference in the sample strings. Source Han Sans exhibits a difference for these characters, but other typeface designs may not.
@kenlunde Hmmm, so is this solution acceptable?
For SHC, the ja-JP
line is highlighted, while the hans
and hant
lines are faded.
@be5invis As long as the entire supported line is highlighted, and entire unsupported lines are faded, that is a better solution. Actually, if the unsupported lines are faded, the supported line can simply be rendered in black.
@kenlunde Yeah, the "target" lines are black while unsupported lines are faded. @chrissimpkins I think this solution is acceptable.
@be5invis @kenlunde
That sounds good. This will be much easier to maintain than extensively highlighted options.
Have we removed the second of the Japanese Kana lines in https://github.com/chrissimpkins/codeface/issues/114#issuecomment-168395997 ? If not, can you let me know if both lines should be considered supported by all of the fonts labeled as having Japanese language support in the first post of this thread? This information is from @kenlunde's review of all fonts that have been recommended to date.
Thanks much.
Can anyone suggest an application that supports settings to disable font fallbacks?
@chrissimpkins The font-fallback behavior is strongly platform specific. I think improving your existing rendering script to support "fading" is better than taking a snapshot from some editor. As for the Kana lines, both of them should be preserved (and not being faded), because both Hirakana dna Katakana are used in Japanese, and are included in the national standard encodings in Japan, Mainland China, Taiwan, Hong Kong and Korea.
@be5invis Thanks. Will look around to see what I can find. The documentation of the Python Pango/Cairo libraries is pretty poor and my manipulation of the attributes to remove font fallbacks broke the syntax highlighter (Pygments). It is not a situation where it is saving me time if there is a simple application where I can do this and take screenshots.
@kenlunde Ken, does In Design on OS X support font fallback settings (or ideally Photoshop - not good for layouts but which I already have)?
@be5invis Although Chinese and Korean fonts almost always include glyphs for kana, because the national standards on which they're based includes them, I would strongly advise against using Chinese or Korean fonts for displaying kana. I explained me reasons in a different post to this issue. I suggest removing the "Kana" label, and simply tag those lines as ja-JP. In other words, the Japanese sample should consist of three lines.
@chrissimpkins Because InDesign is a high-end authoring app, which intended to give maximum control to the user, if the selected font does not have a glyph for a character, a pink-colored .notdef glyph is displayed.
removing the "Kana" label, and simply tag those lines as ja-JP
got it.
a pink-colored .notdef glyph is displayed
excellent, thank you very much. Will explore
Goal
Create a separate gallery of free fonts that support CJK character sets and are licensed for redistribution in this repository
Need
Feedback from developers who use CJK fonts on
Status
Accept Pull Requests
Yes, definitely
Please Contribute
Any and all feedback is warmly welcomed and highly encouraged. Please use this thread to discuss further.
This idea developed out of discussions in #55. This will be used as the working thread for this topic.
Current Development Gallery Page
This gallery has been released and is now available at https://github.com/chrissimpkins/codeface/blob/master/CJK.md
The development gallery page is located in thecjk
branch here https://github.com/chrissimpkins/codeface/blob/cjk/CJK.mdCurrent Image Suggestions from Thread Discussion
The following is a current list of suggested image types for each font based upon the discussion to date in the thread:
Current Typeface Suggestions from Thread Discussion
Ricty = Japanese (SIL Open Font License, Version 1.1 & IPA Font License Agreement v1.0)Not licensed for redistribution per developer