Closed whereswaldon closed 7 months ago
Interesting approach ! Let me start with the easy answer :
I'm happy to work on this feature, or split the work between us if you have a concrete idea of how to implement parts of this.
I think I have indeed a precise idea for the "footprint part" (inspired by fontconfig) :
A langSet would be represented by a bit set :
type langSet [8]uint32
This would require to map language.Language
to the internal byte code, but it would save quite some space on the index.
Having say that, I'm not sure I understand what exactly would be the ResolveForLang
algorithm. Could you elaborate on that part?
Interesting approach ! Let me start with the easy answer :
I'm happy to work on this feature, or split the work between us if you have a concrete idea of how to implement parts of this.
I think I have indeed a precise idea for the "footprint part" (inspired by fontconfig) :
* collect, for every usual languages, a representative sample (as a string) * for performance reasons, choose a mapping between this languages to a byte (there is less than 256 languages) * then, for each font, collect the runeSet, and filter the languages by keeping only the ones whose sample is included in the runeSet.
A langSet would be represented by a bit set :
type langSet [8]uint32
This would require to maplanguage.Language
to the internal byte code, but it would save quite some space on the index.
Sounds good to me! Just curious, is this language metadata unreliable?
Having say that, I'm not sure I understand what exactly would be the
ResolveForLang
algorithm. Could you elaborate on that part?
My goal would be to identify the font matching the current query that would be used to display a given language. A simple implementation would perform the same steps as ResolveFace except would stop at the first face supporting the target language instead of testing for support of a particular rune.
Perhaps this is a bad idea, but I can't think of another way to identify the font that the user will expect to be primary within the text. If you know of other approaches, please share them. :D
My goal would be to identify the font matching the current query that would be used to display a given language. A simple implementation would perform the same steps as ResolveFace except would stop at the first face supporting the target language instead of testing for support of a particular rune.
Perhaps this is a bad idea, but I can't think of another way to identify the font that the user will expect to be primary within the text. If you know of other approaches, please share them. :D
Thanks for the details, I get it now. We basically want to compute the intersection, over the runes used in a given language, of the fonts supporting theses runes, that makes sense!
Just curious, is this language metadata unreliable?
This list would give you the languages, but we would also need a sample for each one.
Besides, this table is used internally by harfbuzz to convert a regular language.Language
to the internal opentype tag.
I'm happy to work on this feature, or split the work between us if you have a concrete idea of how to implement parts of this.
I'm away from a computer for a week, so I would be happy to let you work on it.
Some languages text samples databases :
https://kermitproject.org/utf8.html#glass https://gitlab.freedesktop.org/fontconfig/fontconfig/-/tree/main/fc-lang
I'm not sure what are the licenses, though..
It appears that the fontconfig .orth
files are available under this license:
# Copyright © 2002 Keith Packard
#
# Permission to use, copy, modify, distribute, and sell this software and its
# documentation for any purpose is hereby granted without fee, provided that
# the above copyright notice appear in all copies and that both that
# copyright notice and this permission notice appear in supporting
# documentation, and that the name of the author(s) not be used in
# advertising or publicity pertaining to distribution of the software without
# specific, written prior permission. The authors make no
# representations about the suitability of this software for any purpose. It
# is provided "as is" without express or implied warranty.
#
# THE AUTHOR(S) DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
# INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO
# EVENT SHALL THE AUTHOR(S) BE LIABLE FOR ANY SPECIAL, INDIRECT OR
# CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE,
# DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER
# TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
# PERFORMANCE OF THIS SOFTWARE.
#
I'm honestly unsure if us parsing their .orth
library and building our own code out of it requires us to carry this license. The .orth
files are purely data (they contain ranges of codepoints used by each language). It's hard for me to reason about how normal software licensing applies in this case, since we're not modifying "the software." To be defensive, we could carry this license on the single source code file we generated from the .orth
library. It's not very restrictive.
@whereswaldon I'll take a stab at generating the language samples from fontconfig .orth files, if that's OK for you.
Absolutely! I don't think I could tackle it myself for quite a while.
This has been discussed previously (see below the hr)
I've discovered what I think is a compelling use case for including language info in footprints: determining the "primary" font of a piece of text. I'd like to set the default line height for a paragraph to the builtin line height of the primary font in the text. However, it's difficult to define "primary" font because you can't know a priori what font will be used. It's chosen based on aspect and the codepoints in the text.
You could consider using heuristics like the most-frequently-occurring font within the text, but you can also trivially create pathological cases that defeat such logic.
The best option I've been able to devise is this:
This should result in a stable choice of primary font that will work well with the rest of an application's UI.
Implementing this is tricky as there isn't a good mapping between system
language.Language
andlanguage.Script
(and as far as I understand, there cannot be). For that reason, I think the only way to query "which font will be used for this query when displaying the system language" is to expose a FontMapResolveFace
-like operation that acts on languages instead of runes. This, in turn, requires our footprints to carry supported languages so that they can be efficiently queried.@benoitkugler Does this make sense and seem like a good approach? I'm happy to work on this feature, or split the work between us if you have a concrete idea of how to implement parts of this.
Originally posted by @whereswaldon in https://github.com/go-text/typesetting/issues/87#issuecomment-1629478341