fonttools / fontbakery

🧁 A font quality assurance tool for everyone
https://fontbakery.readthedocs.io
Apache License 2.0
550 stars 102 forks source link

fontbakery.profiles.shared_conditions: Improve is_cjk_font #2796

Open m4rc1e opened 4 years ago

m4rc1e commented 4 years ago

I don't think we should be doing bit manipulation on the OS/2 ulCodePageRanges attributes.

If I run the function on our whole collection it picks up a few fonts which do not contain CJK and it also crashes on a font

/Users/marcfoley/Type/fonts/ofl/seoulhangang
/Users/marcfoley/Type/fonts/ofl/mplus1p
/Users/marcfoley/Type/fonts/ofl/frijole <-- not cjk
/Users/marcfoley/Type/fonts/ofl/jejuhallasan
/Users/marcfoley/Type/fonts/ofl/seoulnamsanvertical
/Users/marcfoley/Type/fonts/ofl/gochihand
/Users/marcfoley/Type/fonts/ofl/eastseadokdo
/Users/marcfoley/Type/fonts/ofl/geostar
/Users/marcfoley/Type/fonts/ofl/stylish
/Users/marcfoley/Type/fonts/ofl/himelody
/Users/marcfoley/Type/fonts/ofl/allerta <-- not cjk
/Users/marcfoley/Type/fonts/ofl/codystar
/Users/marcfoley/Type/fonts/ofl/jejumyeongjo
/Users/marcfoley/Type/fonts/ofl/roundedmplus1c
/Users/marcfoley/Type/fonts/ofl/nikukyu
/Users/marcfoley/Type/fonts/ofl/blackhansans
/Users/marcfoley/Type/fonts/ofl/nanummyeongjo
/Users/marcfoley/Type/fonts/ofl/creepster <-- not cjk
/Users/marcfoley/Type/fonts/ofl/seoulhangangcondensed
/Users/marcfoley/Type/fonts/ofl/mashanzheng
/Users/marcfoley/Type/fonts/ofl/antic <-- not cjk
/Users/marcfoley/Type/fonts/ofl/amstelvaralpha <-- not cjk
/Users/marcfoley/Type/fonts/ofl/geostarfill <-- not cjk
/Users/marcfoley/Type/fonts/ofl/longcang
Traceback (most recent call last):
  File "is_cjk.py", line 14, in <module>
    if is_cjk_font(font):
  File "/Users/marcfoley/Type/fontbakery/Lib/fontbakery/callable.py", line 99, in __call__
    return self.__wrapped__(*args, **kwds)
  File "/Users/marcfoley/Type/fontbakery/Lib/fontbakery/profiles/shared_conditions.py", line 298, in is_cjk_font
    if os2.ulCodePageRange1 & (1 << bit):
AttributeError: 'table_O_S_2f_2' object has no attribute 'ulCodePageRange1'

A better approach is to get the best cmap then count all the glyphs which have an east asian width. If it's over a certain threshold, return True else False. I've sketched the following quick function:

cmap = ttFont.getBestCmap()
width_counter = Counter([unicodedata.east_asian_width(chr(i)) for i in cmap.keys()])
cjk_glyph_ratio = width_counter["W"] / len(cmap)
return True if cjk_glyph_ratio >= 0.25 else False

It's probably missing many edgecases, but the results are much cleaner. It returns every CJK family in our collection and doesn't pick up any non-CJK families.

/Users/marcfoley/Type/fonts/ofl/seoulhangang
/Users/marcfoley/Type/fonts/ofl/mplus1p
/Users/marcfoley/Type/fonts/ofl/jejuhallasan
/Users/marcfoley/Type/fonts/ofl/seoulnamsanvertical
/Users/marcfoley/Type/fonts/ofl/eastseadokdo
/Users/marcfoley/Type/fonts/ofl/stylish
/Users/marcfoley/Type/fonts/ofl/himelody
/Users/marcfoley/Type/fonts/ofl/jejumyeongjo
/Users/marcfoley/Type/fonts/ofl/roundedmplus1c
/Users/marcfoley/Type/fonts/ofl/nikukyu
/Users/marcfoley/Type/fonts/ofl/blackhansans
/Users/marcfoley/Type/fonts/ofl/nanummyeongjo
/Users/marcfoley/Type/fonts/ofl/seoulhangangcondensed
/Users/marcfoley/Type/fonts/ofl/mashanzheng
/Users/marcfoley/Type/fonts/ofl/longcang
/Users/marcfoley/Type/fonts/ofl/jejugothic
/Users/marcfoley/Type/fonts/ofl/nanumgothiccoding
/Users/marcfoley/Type/fonts/ofl/gugi
/Users/marcfoley/Type/fonts/ofl/dohyeon
/Users/marcfoley/Type/fonts/ofl/cutefont
/Users/marcfoley/Type/fonts/ofl/blackandwhitepicture
/Users/marcfoley/Type/fonts/ofl/zcoolkuaile
/Users/marcfoley/Type/fonts/ofl/zhimangxing
/Users/marcfoley/Type/fonts/ofl/zcoolqingkehuangyou
/Users/marcfoley/Type/fonts/ofl/cherrybomb
/Users/marcfoley/Type/fonts/ofl/gaegu
/Users/marcfoley/Type/fonts/ofl/nicomoji
/Users/marcfoley/Type/fonts/ofl/liujianmaocao
/Users/marcfoley/Type/fonts/ofl/nanumpenscript
/Users/marcfoley/Type/fonts/ofl/hannari
/Users/marcfoley/Type/fonts/ofl/sawarabimincho
/Users/marcfoley/Type/fonts/ofl/zcoolxiaowei
/Users/marcfoley/Type/fonts/ofl/gamjaflower
/Users/marcfoley/Type/fonts/ofl/hanna
/Users/marcfoley/Type/fonts/ofl/sawarabigothic
/Users/marcfoley/Type/fonts/ofl/yeonsung
/Users/marcfoley/Type/fonts/ofl/gothica1
/Users/marcfoley/Type/fonts/ofl/seoulnamsancondensed
/Users/marcfoley/Type/fonts/ofl/dokdo
/Users/marcfoley/Type/fonts/ofl/kopubbatang
/Users/marcfoley/Type/fonts/ofl/songmyung
/Users/marcfoley/Type/fonts/ofl/nanumbrushscript
/Users/marcfoley/Type/fonts/ofl/poorstory
/Users/marcfoley/Type/fonts/ofl/kiranghaerang
/Users/marcfoley/Type/fonts/ofl/nanumgothic
/Users/marcfoley/Type/fonts/ofl/singleday
/Users/marcfoley/Type/fonts/ofl/kokoro
/Users/marcfoley/Type/fonts/ofl/jua
/Users/marcfoley/Type/fonts/ofl/seoulnamsan
/Users/marcfoley/Type/fonts/ofl/sunflower
felipesanches commented 4 years ago

sounds great!

m4rc1e commented 4 years ago

Here's the full list for the current func after throwing into a try/except block:

/Users/marcfoley/Type/fonts/ofl/postnobillsjaffna
/Users/marcfoley/Type/fonts/ofl/jejugothic
/Users/marcfoley/Type/fonts/ofl/nanumgothiccoding
/Users/marcfoley/Type/fonts/ofl/gugi
/Users/marcfoley/Type/fonts/ofl/dohyeon
/Users/marcfoley/Type/fonts/ofl/anonymouspro
/Users/marcfoley/Type/fonts/ofl/chelseamarket
/Users/marcfoley/Type/fonts/ofl/tangerine
/Users/marcfoley/Type/fonts/ofl/tradewinds
/Users/marcfoley/Type/fonts/ofl/arbutus
/Users/marcfoley/Type/fonts/ofl/unifrakturcook
/Users/marcfoley/Type/fonts/ofl/adobeblank
/Users/marcfoley/Type/fonts/ofl/cutefont
/Users/marcfoley/Type/fonts/ofl/unifrakturmaguntia
/Users/marcfoley/Type/fonts/ofl/blackandwhitepicture
/Users/marcfoley/Type/fonts/ofl/pecita
/Users/marcfoley/Type/fonts/ofl/zcoolkuaile
/Users/marcfoley/Type/fonts/ofl/karla/static
/Users/marcfoley/Type/fonts/ofl/wallpoet
/Users/marcfoley/Type/fonts/ofl/zhimangxing
/Users/marcfoley/Type/fonts/ofl/zcoolqingkehuangyou
/Users/marcfoley/Type/fonts/ofl/michroma
/Users/marcfoley/Type/fonts/ofl/cherrybomb
/Users/marcfoley/Type/fonts/ofl/dorsa
/Users/marcfoley/Type/fonts/ofl/gaegu
/Users/marcfoley/Type/fonts/ofl/nicomoji
/Users/marcfoley/Type/fonts/ofl/caesardressing
/Users/marcfoley/Type/fonts/ofl/griffy
/Users/marcfoley/Type/fonts/ofl/jomolhari
/Users/marcfoley/Type/fonts/ofl/liujianmaocao
/Users/marcfoley/Type/fonts/ofl/nanumpenscript
/Users/marcfoley/Type/fonts/ofl/hannari
/Users/marcfoley/Type/fonts/ofl/sawarabimincho
/Users/marcfoley/Type/fonts/ofl/allertastencil
/Users/marcfoley/Type/fonts/ofl/zcoolxiaowei
/Users/marcfoley/Type/fonts/ofl/gamjaflower
/Users/marcfoley/Type/fonts/ofl/carterone
/Users/marcfoley/Type/fonts/ofl/emilyscandy
/Users/marcfoley/Type/fonts/ofl/skranji
/Users/marcfoley/Type/fonts/ofl/marvel
/Users/marcfoley/Type/fonts/ofl/josefinslab
/Users/marcfoley/Type/fonts/ofl/mysteryquest
/Users/marcfoley/Type/fonts/ofl/karlatamilinclined
/Users/marcfoley/Type/fonts/ofl/hanna
/Users/marcfoley/Type/fonts/ofl/karlatamilupright
/Users/marcfoley/Type/fonts/ofl/sawarabigothic
/Users/marcfoley/Type/fonts/ofl/uchen
/Users/marcfoley/Type/fonts/ofl/oldenburg
/Users/marcfoley/Type/fonts/ofl/yeonsung
/Users/marcfoley/Type/fonts/ofl/gothica1
/Users/marcfoley/Type/fonts/ofl/seoulnamsancondensed
/Users/marcfoley/Type/fonts/ofl/dokdo
/Users/marcfoley/Type/fonts/ofl/frederickathegreat
/Users/marcfoley/Type/fonts/ofl/kopubbatang
/Users/marcfoley/Type/fonts/ofl/songmyung
/Users/marcfoley/Type/fonts/ofl/nanumbrushscript
/Users/marcfoley/Type/fonts/ofl/poorstory
/Users/marcfoley/Type/fonts/ofl/butterflykids
/Users/marcfoley/Type/fonts/ofl/kiranghaerang
/Users/marcfoley/Type/fonts/ofl/jollylodger
/Users/marcfoley/Type/fonts/ofl/leaguescript
/Users/marcfoley/Type/fonts/ofl/nanumgothic
/Users/marcfoley/Type/fonts/ofl/singleday
/Users/marcfoley/Type/fonts/ofl/seaweedscript
/Users/marcfoley/Type/fonts/ofl/kokoro
/Users/marcfoley/Type/fonts/ofl/jua
/Users/marcfoley/Type/fonts/ofl/miniver
/Users/marcfoley/Type/fonts/ofl/seoulnamsan
/Users/marcfoley/Type/fonts/ofl/princesssofia
/Users/marcfoley/Type/fonts/ofl/sunflower
/Users/marcfoley/Type/fonts/ofl/postnobillscolombo
/Users/marcfoley/Type/fonts/apache/calligraffitti
/Users/marcfoley/Type/fonts/apache/chewy
/Users/marcfoley/Type/fonts/apache/homemadeapple
/Users/marcfoley/Type/fonts/apache/unkempt
/Users/marcfoley/Type/fonts/apache/rancho
/Users/marcfoley/Type/fonts/apache/sunshiney
/Users/marcfoley/Type/fonts/apache/walterturncoat
/Users/marcfoley/Type/fonts/apache/permanentmarker
/Users/marcfoley/Type/fonts/apache/creepstercaps
/Users/marcfoley/Type/fonts/apache/cherrycreamsoda
/Users/marcfoley/Type/fonts/apache/satisfy
/Users/marcfoley/Type/fonts/apache/irishgrover
/Users/marcfoley/Type/fonts/apache/comingsoon
/Users/marcfoley/Type/fonts/apache/craftygirls
/Users/marcfoley/Type/fonts/apache/schoolbell
/Users/marcfoley/Type/fonts/apache/slackey
/Users/marcfoley/Type/fonts/apache/kranky
/Users/marcfoley/Type/fonts/apache/fontdinerswanky

There's too many non-CJK families in here for my liking. More than happy to update this function.

chrissimpkins commented 4 years ago

Reposting this from a closed thread.

I suggest that we add a test to confirm that those bit flags are properly set when fonts are identified as "CJK". MS products do not play well with CJK fonts without these set correctly. I think that we began with the logic: assume that the build is properly defined, then do the CJK tests and notify the user that this does not look like a valid CJK font. IMO you do want to run the check on fonts to detect when the bit flags are not set correctly and the font is meant to be a "CJK font". It is, perhaps, less relevant that they are set in fonts that are not intended as CJK fonts and are not valid CJK fonts?

What a "CJK" font is for the purposes of the bit flag settings, and automated test gates in general, is an open question. Some teams (incl Adobe as I understand from @cjchapman) use a very low glyph set inclusion bar to meet the requirement.

chrissimpkins commented 4 years ago

And if we can't automate the definition of what is "CJK", then I think this argues for a dedicated CJK profile so that the user can define it themselves. Perhaps this doesn't need to be automated if compilers have different definitions, teams have different definitions, and the definition itself is very muddy.

m4rc1e commented 4 years ago

I agree with your points. It seems this issue as you're suggesting is much larger than a simple function to determine whether a font is "CJK" or not.

I'm happy for us to use a custom profile if we're not able to automate this. However, we should we really try. It seems possible.

A check to ensure that unicode ranges are set correctly also sounds like a good idea as well.

Thanks Chris!

m4rc1e commented 4 years ago

With my current algo, 1 would fail if a font contained only 181 Kana characters but also had 800+ Latin glyphs.