Open yanone opened 1 month ago
Anything using characters_per_script
should have the network
condition, because it uses youseedee, which downloads Unicode files at runtime.
@dscorbett Excellent catch; I added that.
@simoncozens Getting extended character metadata is probably the reason you implemented youseedee
in the first place, but what do you think of https://github.com/iwsfutcmd/unicodedataplus (on par with Unicode 15) or even https://github.com/fonttools/unicodedata2 (says Unicode 13 in the README) for getting a character's script?
Update: looking at unicodedata2
's code it appears as not supporting 'script'.
Since unicodedataplus has no external dependencies, I've refactored characters_per_script()
to use that instead so it can work offline. Let me know what you guys think...
The failed check says it can't import unicodedataplus.script
, but I can't see why, especially when the majority of tests pass. Need an adult here.
it uses youseedee, which downloads Unicode files at runtime.
Once, and then stores them.
I'm not convinced of the need to add another Unicode library to avoid a single network access.
I'm not convinced of the need to add another
FWIW, youseedee
was previously not part of the FB package. I had to add that explicitly.
Unicode library to avoid a single network access.
Yeah, that's the question. Aren't there maybe scenarios in wasm
environments or so where network access is generally impossible? I don't have the overview. I would generally try to avoid network access, but if it would work in all major environments, I don't see a usability issue of downloading it once. It's more a technical question. For instance, it would be sad if such a check would not be available in fontbakery.com.
FWIW, youseedee was previously not part of the FB package.
OK, yeah, not explicitly but it was a transitive dependency of stuff on the shaping profile.
Aren't there maybe scenarios in wasm environments or so where network access is generally impossible?
This is true, and more to the point the filesystem isn't available either. So maybe it's OK.
pip install unicodedataplus
failed for me on macOS 14.4.1 with “error: incompatible pointer to integer conversion returning 'void *' from a function with result type 'int' [-Wint-conversion]” while the installation script was running clang. I had to run CPPFLAGS=-Wno-int-conversion pip install unicodedataplus
instead.
I had to run
CPPFLAGS=-Wno-int-conversion pip install unicodedataplus
instead.
Is that a deal-breaker? I'm not a Python pro
What do you guys think about bundling the Unicode file with Fontbakery for youseedee?
My guess is that Simon didn't want to have to make new packages for every Unicode update. Or are there licensing reasons?
note: I have posted the comment below by mistake as an edit of the original post by @simoncozens. I apologize for that. I have already edited back the post to its original contents and here's my reply:
FWIW, youseedee was previously not part of the FB package.
OK, yeah, not explicitly but it was a transitive dependency of stuff on the shaping profile.
That's an interesting aspect of dependency management. I would say that if our code uses the dep's API, then we should explicitly list the dependency, even if it would already be installed as a "dep of a dep".
My reasoning is that one does not necessarily know what are the deps of the deps, as this can be considered an implementation detail of the dependency, and it would be more robust to treat the deps as black-boxes, from the point of view of dependency management.
What do you guys think about bundling the Unicode file with Fontbakery for youseedee?
My guess is that Simon didn't want to have to make new packages for every Unicode update. Or are there licensing reasons?
I'm also interested in hearing what @simoncozens thinks about this.
Yes, my idea was to have a Python package which would always install the latest Unicode version. I genuinely didn't have WASM on my radar when I wrote it. :-) Now I see that for that environment, something which bundles the data might be better.
Some lowercase letters don’t have uppercase counterparts or vice versa, in which case 'smcp' and 'c2sc' can’t be assumed to be meaningful. Instead of checking a character’s general category, you should check that lowercasing or uppercasing it produces a different string.
I only brought up the CPPFLAGS
issue because it is a surprising requirement that should be documented. The workaround is easy, so I wouldn’t consider it a deal-breaker if you choose to use that package.
Description
(Issue #4713)
Rewrote the entirely useless
small_caps
check (0 FAILs on GF library) from scratch. It now yields for GF:Indeed, missing
smcp
orc2sc
substitutions are very common. Only 2.5% of fonts with those features are completely defined. I manually cross-checked some of the fonts. In many cases, the glyphs are present but missing from the feature code.Still, marked the check as experimental for good measure (especially since we may need a few more hard-coded exceptions here and there).
Also, moved the check from GF to universal profile.
Since I'm now excluding incomplete legacy Greek (
mu
,Ohm
, etc.) dynamically, I implemented the same for thecase_mapping
check. Hard-coding exceptions for Greek will automatically exclude them from being checked for fonts that actually support Greek. The basic Greek alphabet is 24+24 characters, so as soon as Greek support is below 24, it's now being excluded dynamically (= whatever Greek is in the font).Checklist
CHANGELOG.md
Closes #4713