Wakamai-Fondue / wakamai-fondue-site

Wakamai Fondue website
Apache License 2.0
43 stars 8 forks source link

Language detection is very strict in beta #143

Open driehuis opened 3 years ago

driehuis commented 3 years ago

When looking at Dinish-Regular v2.009 in both production and beta versions if the Wakamai Fondue, I noticed that in the beta a bunch of languages disappeared from the list:

Release: 27 languages: Afrikaans, Albanian, Basque, Catalan, Danish, Dutch, English, Estonian, Faroese, Filipino, Finnish, French, Galician, German, Icelandic, Indonesian, Irish, Italian, Malay, Norwegian Bokmål, Polish, Portuguese, Spanish, Swahili, Swedish, Turkish, Zulu

Beta: Estonian, Irish, Italian, Norwegian Bokmål, Swahili and Zulu

The release in question can be downloaded from https://github.com/playbeing/dinish/releases/download/v2.009/dinish-otf.zip) to reproduce the issue.

I have since checked DINish against the list from https://r12a.github.io/app-charuse/ and discovered that the missing glyphs were the hyphentwo[2010], minute[2032] and second[2033] Unicode characters, and I don't believe that these are sorely missed when rendering Dutch. After I added these glyphs (and a ton of others), the language support for Dutch was shown in both production and beta Wakamai Fondue.

It doesn't feel right to be that strict. Even English had dropped off the list (the mother tongue of the people that created the US-ASCII character set). I'm not debating the correctness of the result, and I realize just what kind of rabbit hole you go down if you want to qualify the results ("Dutch would be fully supported if you add hyphentwo, minute and second glyphs"), but it may reduce the Beta's appeal for casual font users.

It would be cool if Wakamai Fondue could show which characters are unsupported for any given language, but that sounds like a lot of work.

Maybe just a blurb about how the detection works? "Language coverage is determined by checking against [definition X]. Sometimes, minor issues such missing rarely used punctuation characters can cause a language to not show up as supported".

RoelN commented 2 years ago

I tried to make the language detection better for the beta, but I might have made it worse >_<

I've come to realise you can never reliably say "this font supports that language". What about loanwords, foreign names, historical characters, etc.

So instead of trying to determine for you which language support a font has, I want to give an indication about the support for a script. Then you can determine for yourself if that's adequate for your purposes.

I'd like to use something like https://github.com/bramstein/detect-writing-script for this.

driehuis commented 2 years ago

I think language support should remain separate from script support. There is value in knowing if the Turkish s-cedilla or the Romanian comma accents are supported. And of course, one can't properly render French without guillemets, so punctuation can't be left out of scope completely.

You're right about loan words etc. I'd advocate to exclude them from consideration. If you want a Vietnamese name to be rendered correctly in a Dutch text, just pick a font that supports both languages :-)

There's no easy solution I'm afraid.

RoelN commented 2 years ago

And there's even more complexity in this related issue! https://github.com/Wakamai-Fondue/wakamai-fondue-site/issues/29