Wakamai-Fondue / wakamai-fondue-site

Wakamai Fondue website
Apache License 2.0
37 stars 6 forks source link

[ENGINE] Language / script support #29

Open RoelN opened 4 years ago

RoelN commented 4 years ago

We now kinda-sorta base language support on Unicode Blocks, as inspired by pyfontaine.

This is weird, as it says a font can only support Dutch when the DOUBLE DAGGER char is present!? So I tweaked those lists for languages that I knew, which is obviously incomplete and rather willy-nilly.

Maybe we should just report on how much it matches external tests? E.g. "Unicode Block coverage"?

Or just report the scripts in the font?

Use Omniglot? Check out ISO standards?

That way the user can see how much support for each "authority" on script/language support is, and decide for themselves which is most important.

RoelN commented 4 years ago

Three tables and their associated records apply to scripts and languages: the Script List table (ScriptList) and its script record (ScriptRecord), the Script table and its language system record (LangSysRecord), and the Language System table (LangSys).

ScriptList in GSUB and GPOS

Then what languages/scripts does DFLT represent?

RoelN commented 4 years ago

https://github.com/rosettatype/langs-db

davelab6 commented 4 years ago

Pyfontaine has several sets of unicode language definitions; on a call today you confirmed that you were referring to the Extensis XML definitions authored by Thomas Phinney.

I guess double dagger is required by all the languages that the original 1984 MacRoman character set claimed to support.

Unicode Block coverage is meaningless; and AFAIK no standards bodies offer an authoritative way to truly know what languages a fonts supports.

Rosetta's set is another good set of unicode definitions, but I don't think that foundational approach is sufficient.

For example, the lack of handling Turkish per https://glyphsapp.com/tutorials/localize-your-font-turkish can not be detected that way.

Then what languages/scripts does DFLT represent?

The default, means, whatever the font designer decided that would be.

You can not rely on metadata to tell you language support, either :(

What I think is needed is a novel approach I've never seen implemented, which can tell apart a font with that follows the glyphsapp turkish turorial from one that does not. Another case can be a Devangari font that truly support Marathi.

bramstein commented 4 years ago

CLDR has script definitions you could use to calculate a coverage percentage. Then you could build some tools to show characters that aren't covered and users can decide whether they're important or not. Combined with a minimum percentage, you could do a fairly decent job showing a list of supported scripts and the confidence you have in the support.

RoelN commented 3 years ago

@davelab6 To handle the Turkish Glyphs app situation, how would you check for this, other than detecting if the İ is in the font? (Which Wakamai Fondue currently already does when checking for Turkish support)

ohbendy commented 3 years ago

I just dropped a variable Lao font, and it reports '0 Languages', which gives rather a bad impression of the font, considering how much time we spent making sure it supports Lao, Jru, Khmu, Pali, Sanskrit, Bru, Phunoi and Tai Dam languages. When trying with a Thai font, it also says '0 Languages', though the font supports 21 languages.

I believe it should be somehow programmatisable to diagnose what languages a Thai or Lao font can write.

RoelN commented 3 years ago

With https://github.com/alif-type/reem-kufi no language using the Arabic script is listed as a supported language. Also, would it be a good idea to list supported scripts in addition to languages?

Source: https://twitter.com/KhaledGhetas/status/1356285507689443330