Potential FontBakery check: overall visual sizing relative to common/fallback fonts

thundernixon commented 6 years ago

TL;DR: We should come up with a font sizing standard and suggest that new fonts follow it.

The problem

Fonts have very little standardization for their overall sizing. They generally fit within their UPM grid, but different fonts treat this differently. Even if two fonts are both 1000UPM (as checked by Google Fonts check 116), they may be scaled within that 1000 units in completely different ways. This results in three primary problems:

When using fonts to design and/or code a web layout, a designer wants to be able to try different options quickly and easily. However, the relative sizes of fonts can be very different, even for fonts that are stylistically similar. This means that in order to try different fonts, the CSS must be adjusted, and it can be difficult to isolate the two variables efficiently.
If a font has different sizing than its fallback fonts (e.g. font-family: Noto Serif, Times, serif) there will be a jump in size and possible reflow when the site loads for new visitors.
Accessibility guidelines WCAG 2.0 provide specifications for minimum color contrast on "text" and "large text." Unfortunately, there isn't much definition of what exact point sizes these are. Still, the fact that type sizes vary makes it more difficult for designers and developers to know that they are meeting the correct criteria for accessibility if they are using different fonts.

Expected behavior

Ideally, fonts would have only as much size difference as was needed to express their design ideas (e.g. unique relative vertical metrics, differences in width, contrast, shapes, etc) but fall within a standard which would make fonts more useful on the web, by solving the problems described above.

Proposal: a QA check for the relative sizing of new fonts, flagging outliers

A FontBakery QA could alert designers if the relative size of a given font is an outlier against common fonts. This might be something that designers could ignore if they were making creative display fonts, but something that could be a useful tool for those designing fonts intended for use in text and UI.

Diagrams of the issue

When different fonts are used for the same content, it is easy to see the size differences between them. Look, for example, at Noto Serif next to Tinos in the example below:

You can also see a sort of "bar chart" of comparitive line lengths, by setting type at a small size. This isn't necessarily a huge problem, because letter width is a principle creative decision of font designs, so I wouldn't want to evaluate font sizes based on line length. Still, you might expect "normal width" text fonts to have a bit more similarity than they do:

A common solution for mixing fonts is to match the x-height. Material Design provides a "Theme Editor" Sketch Plugin which, among other things, helps designers to try different fonts in place of the standard Roboto. When a new font is selected, it is swapped in place of Roboto, with its font size rescaled to match x-heights.

Using this tool, I've compared several popular fonts, to show the effect of matching fonts by x-height. The font sizes generated are one way to see the relative sizing compared to a "normal" font like Roboto. It works fairly well in the majority of cases, but sometimes, fonts still end up clearly visually bigger or smaller. In particular, a font with a relatively low x-Height (such as Adobe Caslon Pro or EB Garamond) will look massive next to common fallbacks like Times or Tinos.

A potential standard

As shown when x-height is matched, it is not necessarily a perfect solution due to other shifting dimensions like cap height, ascender height, and descender height, which also effect perceived size.

Potentially, something a little more flexible might be the average of x-height, cap height, and ascender height. This value could be considered a font's "visual size" (I'm making this term up right now). I'm not 100% sure of this metric as a perfect one, because x-height letters do tend to make up the bulk of words in most text. Still, a quick test shows (in my opinion) that it brings the apparent sizing of two fonts closer than simply matching the x-height.

In a font with a relatively low x-height, matching its x-height to a more "normal" font makes it look too big, while matching averages does better:

In a font with a relatively tall x-height, matching its x-height to a more "normal" font makes it look too small, while matching averages does better:

If we really wanted to be clever about visual sizing, we could make a study of relative usage of letters in Latin script languages on popular websites, then base the visual size calculation off of that. Here's some HTML + JS I used to make the above image, in case anyone wants to mess with the fonts or the calculations:

https://codesandbox.io/s/0qv6pl7qkw

You can check different Google Fonts by editing the list in

const gFontFamilies = ["Noto Serif", "EB Garamond"];

...or edit the variables just below that to pull in system / local fonts.

Suggested QA message for Font Bakery

If a font falls outside the bell curve of visual sizes for common fonts (say, OS fallback fonts or most popular fonts on Google Fonts), it could be flagged on FontBakery. For example, we might have a flag something like:

⚠️ WARN: Font's visual size is >5% larger than common system fonts.

The average of this font's visual size (the average of the x-height, cap height, and ascender height) is 7.34% larger than common system fonts. This may make it harder to use in web layouts and more disruptive in font loading on web pages. Scale it down by 7.34%, while keeping the same UPM, to better match likely fallback fonts.

The tolerance we give (e.g. 5%) could be as loose or as specific as we want, but we would probably want some flexibility to allow for different design styles.

We could go so far as to automatically, programmatically scale sizes for fonts put into the library, but that would risk damaging outlines. Especially for fonts with fine details, this could break things. However, I believe there would be user benefit to having a relative size comparison be part of our QA process.

My thoughts around this are still forming, but hopefully this gives a starting point for a discussion, if others think this might be a useful QA check. If so, let me know, and I can try to write the FB test! But first, if you have thoughts around how we might define similar visual sizing, I want to zero-in on something flexible, accurate, fairly universal, and easy enough to explain.

m4rc1e commented 6 years ago

I'm in favour of a 'goldielock' zones check for vertical metrics. It would be great to have someone do the analysis of these zones. Thanks for reporting this.

We've often had to scale new families when their cap heights have been too small.

I don't think this check should outright FAIL a family. What happens if you have a family which has to work on a specific platform, where the font's bounding box cannot exceed a certain size and you're adding a tall non-Latin such as Arabic? one solution is to scale the Latin down.

thundernixon commented 6 years ago

Is "WARN" too severe a response to oddly-scaled fonts? I'm basing that idea off the comment in checkrunner.py

WARN = Status('WARN', 4) # A check that results in WARN may indicate a problem, but also may be OK

...though I could also see "info" being an applicable response.

Are these status documented anywhere besides the Python code? I do have some confusion about what results need fixing, and which can be safely ignored.

That's a good point about a font including taller scripts having a good reason for smaller-than-usual Latin metrics. Probably, it would be good to include that (and other example cases we can think of, such as creative display fonts) as a potential good reason for ignoring the check.

laerm0 commented 6 years ago

I'll need to understand this idea better but, at first blush, different sizing feels more like a feature than a bug to me. This kind of "Helveticization" (i.e. regularization) will kind of push things towards a relatively generic design space. I'd suspect that most experienced designers demoing different fonts know that different fonts will lead to reflow/sizing/color differences.

laerm0 commented 6 years ago

@m4rc1e:

I'm in favour of a 'goldielock' zones check for vertical metrics. It would be great to have someone do the analysis of these zones. Thanks for reporting this.

I actually did a project like this when I was in school: I went by "eras" of type development and chose a handful of the most popular/well-known/highest quality fonts and figured out the average set of metrics for each era. This info would be useful if the designer had a goal to hit a certain style and to know how close they were to what the era's general look and feel was.

thundernixon commented 6 years ago

@laerm0 I'm not suggesting that we would tell designers to make their fonts with any specific pre-defined ratio of sizes. I definitely think that people should be able to make whatever style of font they want to, whether that means it's close to Helvetica, or the x-Height is 1% of the cap height. However, I also think that graphic / web designers shouldn't have to completely readjust layouts in order to try out different fonts. Of course, good designers are used to this, but I don't think it's the ideal experience to deliver to makers or readers. Basically, I would hope to find some way we could make type look to be the same overall size, at the same stated point size.

Making a (non-expert) analogy to music, when I listen to Spotify or the radio, I want there to be different sounds and a dynamic range in different songs. However, I am glad that there is a relatively-standard volume which music is mastered to, so I don't have to constantly boost and lower the volume from artist to artist.

Fonthausen commented 6 years ago

Talking to Dave I was wondering if in Noto most foreign scripts are matched to the Latin lowercase or uppercase ?

thundernixon commented 6 years ago

I should add that I doubt it's likely that we could find a very consistent way to make display fonts like Zapfino look to be the same point size as Times New Roman. So, it's not something I would want to enforce. However, I do think that it would be helpful to give type designers some guidelines for a general practice of scaling to the Em size, so it wouldn't have to be redecided with each new design. Of course, maybe this exists already, and I'm just not aware of it (though, based on examples like Noto Serif and EB Garamond, not many designers are aware of it, either).

thundernixon commented 6 years ago

@fonthausen that's a good question. I know I've heard ideas around this in terms of CJK and Arabic scripts, but I'm not sure I remember correctly. I need to ask a few people about this.

thundernixon commented 6 years ago

Did a few quick experiments to look at Noto specifically.

The tops of Japanese characters align with the cap height of Latin:

The characters in Korean align with the cap height of Latin:

Noto Nashk Arabic doesn't seem to align its baseline with Noto Sans or Serif, which surprises me. However, it is in Early Access, so it may still be changed.

It does appear that Markazi Text matches Latin & Arabic by the cap height however (partly because Arabic doesn't really such a consistent "x-height" in the way that Latin does):

Looking outside of GFonts, TPTQ-Arabic makes some of the best Arabic + Latin fonts. They don't really match any vertical metric exactly, though they come close to matching ascender height sometimes:

moyogo commented 6 years ago

However, I also think that graphic / web designers shouldn't have to completely readjust layouts in order to try out different fonts.

Fonts are designed for different optical sizes. Maybe it would be useful to have information or recommendation on how a font should be used. For example Font XYZ has a x-height and caps, use it for __ sizes, etc.

davelab6 commented 6 years ago

I love this for new families!

We could go so far as to automatically, programmatically scale sizes for fonts put into the library, but that would risk damaging outlines

Yeah, generally I am shying away from any auto fixing; its good to offer scripts/tutorials to designers on how to fix things (and references to them in the check result text) but I think auto-fixing is unwise

davelab6 commented 6 years ago

Fonts are designed for different optical sizes. Maybe it would be useful to have information or recommendation on how a font should be used.

Yes, I think all static fonts should, going forwards, have a STAT table with a single opsz value

thundernixon commented 6 years ago

Fonts are designed for different optical sizes.

True, and I think that including information on recommendations for use is a great idea.

Still, I don't think that changing the overall visual size of a font is a valid approach to designing for a specific optical size. That is, making a font appear to be 12px if it's set at 8px isn't really solving for an 8px optical size; it's just making a font that can't be sized consistently with others at 8px. Designing for optical sizes and different contexts should be about how the forms of letters are approached, and how the metrics within a font are treated relative to one another.

As an example, I think it's safe to say that both Source Serif and Noto Serif are designed for text sizes on screens. Judging by their overall stroke contrast, they seem to be intended for a very similar optical size – likely, body text sizes on a wide range of digital screens. As an indication that they are likely intended for roughly the same optical size (though with slightly different design strategies), if you set them to be roughly the same visual size, the strokes are nearly identical:

So, what purpose does it serve that Noto Serif has a significantly bigger visual size when set at the same point size as Source Serif?

It would be much more effective for designers if these fonts had more-similar sizing, such as is possible if you match their average heights:

Making font sizing more consistent between families does not conflict with the design decisions put into making them work well for their intended purpose. Noto and Serif would still be different fonts, with different strategies for maximizing readability of text on the web. Noto would still have a bigger relative x-Height. Source would still use a transitional model with a crisp modern approach. But, for graphic designers, the choice would be about the strengths and intent of each, rather than this being obscured by different visual sizing.

I'm not suggesting that either of these specific families be changed now, of course. I'm just suggesting that there could be a standard or recommendation set for new fonts, to separate visual sizing from the "actual" design of type. I am also suggesting that typical fallback fonts might make a good basis for that standard.

thundernixon commented 5 years ago

Since filing this issue, I've come to believe that it's probably less important and more complicated than I first assumed. Should I close it, or keep it around in case we are looking to add more checks in the future?

If we did implement the check, it might be worth making it deliberately simple and limited. For instance, if a font was marked as a Latin and a serif or sans-serif, we could check the Latin cap height against Helvetica and Times New Roman, and give people an info result if theirs is greater than ≈5% different.

Trying to get too clever about average sizing or about matching visual sizing between world scripts would take more knowledge than I currently have, and would almost inevitably lead to disagreement, anyway.

fonttools / fontbakery