Rewrote small_caps check (and a minor case_mapping improvement)

yanone commented 1 month ago

Description

(Issue #4713)

Rewrote the entirely useless small_caps check (0 FAILs on GF library) from scratch. It now yields for GF:

    ERROR: 0
    FATAL: 0
    FAIL: 464
    WARN: 0
    INFO: 0
    SKIP: 2853
    PASS: 12

Indeed, missing smcp or c2sc substitutions are very common. Only 2.5% of fonts with those features are completely defined. I manually cross-checked some of the fonts. In many cases, the glyphs are present but missing from the feature code.

Still, marked the check as experimental for good measure (especially since we may need a few more hard-coded exceptions here and there).

Also, moved the check from GF to universal profile.

Since I'm now excluding incomplete legacy Greek (mu, Ohm, etc.) dynamically, I implemented the same for the case_mapping check. Hard-coding exceptions for Greek will automatically exclude them from being checked for fonts that actually support Greek. The basic Greek alphabet is 24+24 characters, so as soon as Greek support is below 24, it's now being excluded dynamically (= whatever Greek is in the font).

Checklist

[x] update CHANGELOG.md
[x] wait for the tests to pass
[x] request a review

Closes #4713

dscorbett commented 1 month ago

Anything using characters_per_script should have the network condition, because it uses youseedee, which downloads Unicode files at runtime.

yanone commented 1 month ago

@dscorbett Excellent catch; I added that.

@simoncozens Getting extended character metadata is probably the reason you implemented youseedee in the first place, but what do you think of https://github.com/iwsfutcmd/unicodedataplus (on par with Unicode 15) or even https://github.com/fonttools/unicodedata2 (says Unicode 13 in the README) for getting a character's script?

Update: looking at unicodedata2's code it appears as not supporting 'script'.

yanone commented 1 month ago

Since unicodedataplus has no external dependencies, I've refactored characters_per_script() to use that instead so it can work offline. Let me know what you guys think...

yanone commented 1 month ago

The failed check says it can't import unicodedataplus.script, but I can't see why, especially when the majority of tests pass. Need an adult here.

simoncozens commented 1 month ago

it uses youseedee, which downloads Unicode files at runtime.

Once, and then stores them.

I'm not convinced of the need to add another Unicode library to avoid a single network access.

yanone commented 1 month ago

I'm not convinced of the need to add another

FWIW, youseedee was previously not part of the FB package. I had to add that explicitly.

Unicode library to avoid a single network access.

Yeah, that's the question. Aren't there maybe scenarios in wasm environments or so where network access is generally impossible? I don't have the overview. I would generally try to avoid network access, but if it would work in all major environments, I don't see a usability issue of downloading it once. It's more a technical question. For instance, it would be sad if such a check would not be available in fontbakery.com.

simoncozens commented 1 month ago

FWIW, youseedee was previously not part of the FB package.

OK, yeah, not explicitly but it was a transitive dependency of stuff on the shaping profile.

Aren't there maybe scenarios in wasm environments or so where network access is generally impossible?

This is true, and more to the point the filesystem isn't available either. So maybe it's OK.

dscorbett commented 1 month ago

pip install unicodedataplus failed for me on macOS 14.4.1 with “error: incompatible pointer to integer conversion returning 'void *' from a function with result type 'int' [-Wint-conversion]” while the installation script was running clang. I had to run CPPFLAGS=-Wno-int-conversion pip install unicodedataplus instead.

yanone commented 1 month ago

I had to run CPPFLAGS=-Wno-int-conversion pip install unicodedataplus instead.

Is that a deal-breaker? I'm not a Python pro

yanone commented 1 month ago

What do you guys think about bundling the Unicode file with Fontbakery for youseedee?

My guess is that Simon didn't want to have to make new packages for every Unicode update. Or are there licensing reasons?

felipesanches commented 1 month ago

note: I have posted the comment below by mistake as an edit of the original post by @simoncozens. I apologize for that. I have already edited back the post to its original contents and here's my reply:

FWIW, youseedee was previously not part of the FB package.

OK, yeah, not explicitly but it was a transitive dependency of stuff on the shaping profile.

That's an interesting aspect of dependency management. I would say that if our code uses the dep's API, then we should explicitly list the dependency, even if it would already be installed as a "dep of a dep".

My reasoning is that one does not necessarily know what are the deps of the deps, as this can be considered an implementation detail of the dependency, and it would be more robust to treat the deps as black-boxes, from the point of view of dependency management.

felipesanches commented 1 month ago

What do you guys think about bundling the Unicode file with Fontbakery for youseedee?

My guess is that Simon didn't want to have to make new packages for every Unicode update. Or are there licensing reasons?

I'm also interested in hearing what @simoncozens thinks about this.

simoncozens commented 1 month ago

Yes, my idea was to have a Python package which would always install the latest Unicode version. I genuinely didn't have WASM on my radar when I wrote it. :-) Now I see that for that environment, something which bundles the data might be better.

dscorbett commented 1 month ago

Some lowercase letters don’t have uppercase counterparts or vice versa, in which case 'smcp' and 'c2sc' can’t be assumed to be meaningful. Instead of checking a character’s general category, you should check that lowercasing or uppercasing it produces a different string.

I only brought up the CPPFLAGS issue because it is a surprising requirement that should be documented. The workaround is easy, so I wouldn’t consider it a deal-breaker if you choose to use that package.

fonttools / fontbakery

Rewrote small_caps check (and a minor case_mapping improvement) #4721

Description

Checklist