Closed behdad closed 10 years ago
There is no bloat in the actual binary data, because what appear to be separate instances of the same feature are pointing to the same binary data. For the particular font you referenced, the 12 apparent duplicate instances correspond to the 12 script+language declarations. Properly declaring scripts and languages is a necessary part of fonts, especially Pan-CJK ones.
Thanks Ken. I understand that these collapse in the binary. And I understand that separate language systems are useful. But there's no reason I'm aware of for not sharing features amongst multiple language systems.
As I said, it just makes reading the font tables harder. The font has over a hundred features when in reality a dozen will do.
No action needed. Just wanted to bring it up to your attention.
@behdad this is common to many fonts and is pretty much standard practice. From what I understand both AFDKO and VOLT do this. But I agree, it would make more sense if feature records weren't repeated for no reason.
Some background is that different clients expect different degrees of script+language declaration. For single-language fonts, such as all of the OpenType CJK fonts we have developed to date, we declare the appropriate scripts, which is about a half-dozen (DFLT, hani, kana, hang, latn, grek, cyrl), but only the 'dflt' language. Pan-CJK fonts such as Source Han Sans and Noto Sans CJK require that non-default languages also be declared for the appropriate scripts. This is especially important for the 'locl' GSUB feature, but non-default languages are also declared elsewhere, such as in the 'vert' GSUB feature to handle language-specific vertical forms.
I see multiple copies of the same features. Eg:
It makes reading the GSUB tables very hard unnecessarily. The snippet above is from NotoSansCJK-Regular, but I suppose it's the same with Source Han Sans.