go-text / typesetting

High quality text shaping in pure Go.
Other
88 stars 11 forks source link

Font scan #63

Closed benoitkugler closed 1 year ago

benoitkugler commented 1 year ago

As discussed in #17, I've rebased this branch to include the latest changes.

The PR is long : you may want to start by the readme before diving into the implementation.

As a summary, this package provides two functionalities :

I'm open to all your suggestions !

whereswaldon commented 1 year ago

I haven't yet been able to dig into the implementation, but I've read the README and it sparked some thoughts:

I'll wade into the implementation some more soon.

andydotxyz commented 1 year ago

What a phenomenal effort here. I think as well as testing from an API perspective we need to get it compiled and running on iOS/Android devices to check we're not stepping out of sandboxes etc - will likely cause panics or ineffective caches if we do.

whereswaldon commented 1 year ago

Okay, yesterday I did some discovery work on what it would take to use this from within Gio.

image

This is Gio using an automatically-selected system font. I was able to hack together a working prototype in an hour or two. However, there are a number of challenges I encountered that need to be solved before the approach is really viable:

Of the problems, the fontconfig seems like the biggest. Good Linux support is near and dear to my heart, and I want this to work well. At a high level, I think we have two options to make this work:

  1. Emulate parts of fontconfig. This could include:
    • parsing fontconfig's configuration files to extract system font directories
    • parsing fontconfig's configuration files to extract font cache locations and checking if those caches have changed
    • parsing fontconfig's configuration files to extract any user-specific font rules and emulating them
    • parsing the actual contents of fontconfig caches and building our cache type out of them
    • implementing basically all of fontconfig in go such that we don't need to use our custom index type on systems with fontconfig; we instead just reuse the system's config and caches
    • transpiling fontconfig to Go via ccgo or similar and just using the actual upstream fontconfig code. Looks like there is already a transpiled library available.
  2. Shell out to fontconfig if we detect it installed, falling back to our own indexing when it isn't available.

I've eliminated linking to the C fontconfig libraries as a candidate. I think none of us want to take on another C dependency, especially since we positioned typesetting as a pure go project.

benoitkugler commented 1 year ago

Of the problems, the fontconfig seems like the biggest. Good Linux support is near and dear to my heart, and I want this to work well. At a high level, I think we have two options to make this work:

I think I prefer the first option, because, in my use case, I would rather avoid using fontconfig, so that the font selection logic is uniform across systems.

Ignoring user specified settings for now, it seems that it should be enough to implement the first item :

parsing fontconfig's configuration files to extract system font directories

since we can then scan these directories and update our index if necessary. Am I correct ? If so, I think is it relatively easy : in my port, I've written a fontconfig XML format parser, and I could simplify it to just fetch the font directories. This preliminary step could be added in DefaultFontDirectories (which is called at each app startup).

By the way,

parsing the actual contents of fontconfig caches and building our cache type out of them

would be quite hard because it would require to understand the in-memory layout used by fontconfig types, which is doable put a bit painful and error prone. So I would rather do this as a last solution.

benoitkugler commented 1 year ago
  • The font querying needs to support all of the metadata fields. Right now it doesn't support querying by IsMonospace, which is important for Gio's use cases. I'm relatively confident that this is easy to fix.

The logic for now is that monospace fonts are specified via families : the "monospace" special family name is replaced by suitable families. However, we may indeed include a IsMonospace flag in the footprint. (Besides, it could probably be more reliable since it would be computed from the actual glyph widths.)

benoitkugler commented 1 year ago
  • I had to hack the fontmap to return both the font and its metadata every time I resolved a face. Gio always needs the metadata for a font when working with it. We track fonts by their metadata internally, and without it we have no key for the font value. To make the FontMap work seamlessly with Gio, I'd either need the API to return the metadata like my hacked version, or for fonts to generally always carry their metadata (I've had to hack around the lack of it previously). I'm not sure what the tradeoffs are there, beyond a slight increase in the resource requirements to load a font. @benoitkugler Your insight here would be appreciated.

Hum... I thought the (absolute) file name of a font was enough as a key (well the Location type) : do you also need the metadata after having resolved the face, or do you only use it as a key ?

But we could indeed return the metadata: after all the Footprint is a super set of the Metadata struct

whereswaldon commented 1 year ago

Of the problems, the fontconfig seems like the biggest. Good Linux support is near and dear to my heart, and I want this to work well. At a high level, I think we have two options to make this work:

I think I prefer the first option, because, in my use case, I would rather avoid using fontconfig, so that the font selection logic is uniform across systems.

Ignoring user specified settings for now, it seems that it should be enough to implement the first item :

parsing fontconfig's configuration files to extract system font directories

since we can then scan these directories and update our index if necessary. Am I correct ? If so, I think is it relatively easy : in my port, I've written a fontconfig XML format parser, and I could simplify it to just fetch the font directories. This preliminary step could be added in DefaultFontDirectories (which is called at each app startup).

Yes, I think this would be sufficient. I had forgotten that we re-scan the timestamps of the fonts on startup. Hopefully we can do this whole process pretty quickly.

By the way,

parsing the actual contents of fontconfig caches and building our cache type out of them

would be quite hard because it would require to understand the in-memory layout used by fontconfig types, which is doable put a bit painful and error prone. So I would rather do this as a last solution.

Ah, so it's unpacking that directly into C structures? In that case, I totally agree. That's a nightmare.

whereswaldon commented 1 year ago
  • I had to hack the fontmap to return both the font and its metadata every time I resolved a face. Gio always needs the metadata for a font when working with it. We track fonts by their metadata internally, and without it we have no key for the font value. To make the FontMap work seamlessly with Gio, I'd either need the API to return the metadata like my hacked version, or for fonts to generally always carry their metadata (I've had to hack around the lack of it previously). I'm not sure what the tradeoffs are there, beyond a slight increase in the resource requirements to load a font. @benoitkugler Your insight here would be appreciated.

Hum... I thought the (absolute) file name of a font was enough as a key (well the Location type) : do you also need the metadata after having resolved the face, or do you only use it as a key ?

But we could indeed return the metadata: after all the Footprint is a super set of the Metadata struct

It's true that the file name of a font can be used as a key, but I wanted to basically incorporate the results into Gio's existing font matching algorithm. In this way, Gio would first use the current algorithm to select from among the fonts already loaded, and would then fall back to the FontMap if needed. The result from the FontMap would then be registered with the Gio font matcher so that we can find it with the normal algorithm.

benoitkugler commented 1 year ago

So, to sum up and plan our next steps, we should at least tackle the following points :

I'll start with point 3 later.

whereswaldon commented 1 year ago

My next steps are figuring out the failing tests on my system. Then I can pick up the next item from @benoitkugler's checklist.

whereswaldon commented 1 year ago

I've figured out one of my test failures. TestScanIncrementalUpdate was failing because my ext4 filesystem may not be able to provide nanosecond resolution modification times depending upon my kernel config. Inserting a synthetic sleep between the various modifications makes the test pass reliably, but raises a question: does this sort of imprecision in the filesystem metadata have implications for the modification time tracking as a whole?

There's also some question about how this intersects with OS packaging. Some package managers install their content files with the modification timestamps from the contents within the package. We don't actually have a guarantee that a file will have a different timestamp from one package version to the next.

On Linux systems, we could cheat by looking at the modification times of fontconfig caches, I think. If fontconfig regenerated those, the fonts definitely changed. I guess that doesn't tell us which ones though.

My other local test failure is:

--- FAIL: TestResolveFont (0.16s)
    fontmap_test.go:43: unexpected logs 2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face

This test case uses UseSystemFonts() and assumes that something will always match helvetica, but apparently my system doesn't have any candidates.

benoitkugler commented 1 year ago

I've figured out one of my test failures. TestScanIncrementalUpdate was failing because my ext4 filesystem may not be able to provide nanosecond resolution modification times depending upon my kernel config. Inserting a synthetic sleep between the various modifications makes the test pass reliably, but raises a question: does this sort of imprecision in the filesystem metadata have implications for the modification time tracking as a whole?

Interesting ! I think in practice the imprecision is not a big deal, since two scans would typically run with a large time delta between them (for now, two application startups)

There's also some question about how this intersects with OS packaging. Some package managers install their content files with the modification timestamps from the contents within the package. We don't actually have a guarantee that a file will have a different timestamp from one package version to the next.

Would it be possible that two files with the same path have different content but the same timestamp ?

benoitkugler commented 1 year ago

My other local test failure is:

--- FAIL: TestResolveFont (0.16s)
    fontmap_test.go:43: unexpected logs 2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face
        2023/06/14 13:27:14 No font matched for [helvetica] -> returning arbitrary face

This test case uses UseSystemFonts() and assumes that something will always match helvetica, but apparently my system doesn't have any candidates.

Very interesting failure : the Helvetica family is known by the library, so family substitutions should trigger and be enough. More precisely, for Helvetica, the following families are selected : [helvetica nimbussans nimbussansl texgyreheros arial arimo liberationsans albany albanyamt dejavulgcsans dejavusans bitstreamverasans verdana luxisans lucidasansunicode bpgglahointernational tahoma urwgothic nimbussansnarrow loma waree garuda umpush laksaman notosanscjkjp notosanscjkkr notosanscjksc notosanscjktc notosanscjkhk lohitdevanagari droidsansfallback khmeros nachlieli yuditunicode kerkis armnethelvetica artsounk bpgutf8m saysetthaunicode jglaooldarial gfzemenunicode pigiarniq bdavat bcompset kacst-qr urdunastaliqunicode raghindi muktinarrow padmaa hapaxberbère msgothic umepluspgothic microsoftyahei microsoftjhenghei wenquanyizenhei wenquanyibitmapsong arplshanheisununi arplnewsung mgopenmoderna mgopenmodata mgopencosmetica vlgothic ipamonagothic ipagothic sazanamigothic kochigothic arplkaitimgb arplkaitimbig5 arplsungtilgb arplmingti2lbig5 msゴシック zysong18030 nanumgothic undotum baekmukdotum baekmukgulim kacstqura lohitbengali lohitgujarati lohithindi lohitmarathi lohitmaithili lohitkash miri lohitkonkani lohitnepali lohitsindhi lohitpunjabi lohittamil meera lohitmalayalam lohitkannada lohittelugu lohitoriya lklug freesans arialunicodems arialunicode code2000 code2001 sans-serif roya koodak terafik itcavantgardegothic helveticanarrow] Could you manually check that none of these fonts are present on your system ? That seems strange to me... maybe ArchLinux does not come with a lot of fonts installed ?

whereswaldon commented 1 year ago

This test case uses UseSystemFonts() and assumes that something will always match helvetica, but apparently my system doesn't have any candidates.

Very interesting failure : the Helvetica family is known by the library, so family substitutions should trigger and be enough. More precisely, for Helvetica, the following families are selected : [helvetica nimbussans nimbussansl texgyreheros arial arimo liberationsans albany albanyamt dejavulgcsans dejavusans bitstreamverasans verdana luxisans lucidasansunicode bpgglahointernational tahoma urwgothic nimbussansnarrow loma waree garuda umpush laksaman notosanscjkjp notosanscjkkr notosanscjksc notosanscjktc notosanscjkhk lohitdevanagari droidsansfallback khmeros nachlieli yuditunicode kerkis armnethelvetica artsounk bpgutf8m saysetthaunicode jglaooldarial gfzemenunicode pigiarniq bdavat bcompset kacst-qr urdunastaliqunicode raghindi muktinarrow padmaa hapaxberbère msgothic umepluspgothic microsoftyahei microsoftjhenghei wenquanyizenhei wenquanyibitmapsong arplshanheisununi arplnewsung mgopenmoderna mgopenmodata mgopencosmetica vlgothic ipamonagothic ipagothic sazanamigothic kochigothic arplkaitimgb arplkaitimbig5 arplsungtilgb arplmingti2lbig5 msゴシック zysong18030 nanumgothic undotum baekmukdotum baekmukgulim kacstqura lohitbengali lohitgujarati lohithindi lohitmarathi lohitmaithili lohitkash miri lohitkonkani lohitnepali lohitsindhi lohitpunjabi lohittamil meera lohitmalayalam lohitkannada lohittelugu lohitoriya lklug freesans arialunicodems arialunicode code2000 code2001 sans-serif roya koodak terafik itcavantgardegothic helveticanarrow] Could you manually check that none of these fonts are present on your system ? That seems strange to me... maybe ArchLinux does not come with a lot of fonts installed ?

Arch Linux comes with no fonts installed, but I have quite a few on the system now. :D I definitely have Liberation Sans.

whereswaldon commented 1 year ago

I've figured out one of my test failures. TestScanIncrementalUpdate was failing because my ext4 filesystem may not be able to provide nanosecond resolution modification times depending upon my kernel config. Inserting a synthetic sleep between the various modifications makes the test pass reliably, but raises a question: does this sort of imprecision in the filesystem metadata have implications for the modification time tracking as a whole?

Interesting ! I think in practice the imprecision is not a big deal, since two scans would typically run with a large time delta between them (for now, two application startups)

There's also some question about how this intersects with OS packaging. Some package managers install their content files with the modification timestamps from the contents within the package. We don't actually have a guarantee that a file will have a different timestamp from one package version to the next.

Would it be possible that two files with the same path have different content but the same timestamp ?

I believe this can happen. Some package build systems always set the build time to a specific moment in time in the interest of creating reproducible artifacts. So an update to a font packaged in such a way could create v1 and v2 of the font data with the same modification time, which in turn would be unpacked unchanged into the filesystem.

whereswaldon commented 1 year ago

I realized that the CI config was messed up during the rebase and fixed it. Looks like we can't use io/fs in fontscan because it wasn't introduced until something like Go 1.16

benoitkugler commented 1 year ago

Arch Linux comes with no fonts installed, but I have quite a few on the system now. :D I definitely have Liberation Sans.

I guess the next steps would be to check if UseSystemFonts have properly scanned and added LiberationSans, and then to check what the content of Fontmap.candidates is after calling SetQuery

whereswaldon commented 1 year ago

Arch Linux comes with no fonts installed, but I have quite a few on the system now. :D I definitely have Liberation Sans.

I guess the next steps would be to check if UseSystemFonts have properly scanned and added LiberationSans, and then to check what the content of Fontmap.candidates is after calling SetQuery

I've been crawling through the test in rr, and I think it's actually failing to resolve a face for تثذرزسشص, the Latin test text is fine. What's odd about that is that I can see this text in my browser, so I have a font somewhere that can display it. However, my terminal emulator can't display these characters... Makes me wonder if that font is in a nonstandard location or something.

benoitkugler commented 1 year ago

I've pushed 7ef5a64 with support for fontconfig font directories. Let me know what you think. I was not sure if openbsd and freebsd use fontconfig as well. If so, we should also update DefaultFontDirectories.

benoitkugler commented 1 year ago

I realized that the CI config was messed up during the rebase and fixed it. Looks like we can't use io/fs in fontscan because it wasn't introduced until something like Go 1.16

Since Fyne will target go 1.17 in its next release, we could maybe wait for it and bump the tests to 1.16 before merging font-scan.

whereswaldon commented 1 year ago

I realized that the CI config was messed up during the rebase and fixed it. Looks like we can't use io/fs in fontscan because it wasn't introduced until something like Go 1.16

Since Fyne will target go 1.17 in its next release, we could maybe wait for it and bump the tests to 1.16 before merging font-scan.

It's not a problem in the test code, but in the actual fontscan implementation. We don't run tests on 1.14, we just try to compile.

In general, I'd rather not wait for a Fyne release. I'll investigate what it would take to remove that dependency from the fontscan implementation.

benoitkugler commented 1 year ago

It's not a problem in the test code, but in the actual fontscan implementation. We don't run tests on 1.14, we just try to compile.

In general, I'd rather not wait for a Fyne release. I'll investigate what it would take to remove that dependency from the fontscan implementation.

Thank you for the tedious backport code..

whereswaldon commented 1 year ago

I've been pouring over the fontconfig docs and the XDG spec to try to figure out exactly what we need to load and in what order so that we're reliably getting everything and actually loading config files in the user's home directory.

Users frequently have font config in ~/.local/share/fonts, ~/.config/fontconfig and maybe other places. I wanted to make sure we didn't miss anything.

XDG says:

A specification that refers to $XDG_DATA_DIRS or $XDG_CONFIG_DIRS should define what the behaviour must be when a file is located under multiple base directories. It could, for example, define that only the file under the most important base directory should be used or, as another example, it could define rules for merging the information from the different files.

So basically it's up to fontconfig to decide which of these config files takes precedence.

Fontconfig says:

$XDG_CONFIG_HOME/fontconfig/conf.d and ~/.fonts.conf.d is the conventional name for a per-user directory of (typically auto-generated) configuration files, although the actual location is specified in the global fonts.conf file.

$XDG_CONFIG_HOME/fontconfig/fonts.conf and ~/.fonts.conf is the conventional location for per-user font configuration, although the actual location is specified in the global fonts.conf file.

Which implies that the global config file is always responsible for loading the user-specific ones.

The only remaining question is how to correctly locate the global fontconfig file? I've tried to figure out the process below:

  1. We start by examining $FONTCONFIG_FILE. If that is set, it's the location of the root XML config file. If it isn't set, assume it's fonts.conf. Resolve that file's location (and the locations of any files referenced by it) according to the following rules source:
    • If they start with ~, resolve them relative to $FONTCONFIG_SYSROOT/$HOME
    • If they start with /, they are an absolute path relative to $FONTCONFIG_SYSROOT (unless they are pre-prefixed with $FONTCONFIG_SYSROOT, in which case it should not be prepended again)
    • If the do not start with / or ~, they are relative to the system conf directories, which are defined by $FONTCONFIG_SYSROOT + each element of $FONTCONFIG_PATH (which is a path list). If $FONTCONFIG_PATH is undefined, you use a system-specific path that in practice seems like it's /etc/fonts

It's a mess, basically. I'll try to implement this.

whereswaldon commented 1 year ago

I've pushed an update to implement the above. @benoitkugler I had to change some test data because it was including config files via a relative path like <include>other.conf</include>, but fontconfig doesn't seem to support that. <dir> can supply prefix=relative to get that behavior, but not <include>. Please sanity check me here. I couldn't find any instances of relative include paths in real fontconfig files, and I couldn't find anything in the docs to lead me to think it's allowed.

benoitkugler commented 1 year ago

I've pushed an update to implement the above. @benoitkugler I had to change some test data because it was including config files via a relative path like <include>other.conf</include>, but fontconfig doesn't seem to support that. <dir> can supply prefix=relative to get that behavior, but not <include>. Please sanity check me here. I couldn't find any instances of relative include paths in real fontconfig files, and I couldn't find anything in the docs to lead me to think it's allowed.

You're right, I've implemented that as it felt right, but I had not studied FcGetFilename close enough.. Thank you for your commit, which should be more accurate and close to real world configurations !

benoitkugler commented 1 year ago

@whereswaldon I'm working on IsMonospace support, but it's not 100% clear what the semantics should be. The issue is to define the interaction with the Families list. For instance, providing both IsMonospace and 'Times' as family does not make sens (since Times is not mono-spaced).

What interaction would you like to support ? Is there a case where both an arbitrary family and monospace would be queried ?

whereswaldon commented 1 year ago

@whereswaldon I'm working on IsMonospace support, but it's not 100% clear what the semantics should be. The issue is to define the interaction with the Families list. For instance, providing both IsMonospace and 'Times' as family does not make sens (since Times is not mono-spaced).

What interaction would you like to support ? Is there a case where both an arbitrary family and monospace would be queried ?

I think it's the user's fault if they request an impossible situation like a monospaced times font. Monospace seems (to me) to be a more specific request than the family. If a layout is requiring it, not respecting that seems likelier to break something than not respecting the family. My intuition is that if we can't find a family match that is monospaced, we should just return an arbitrary monospaced font.

However, maybe that's unreasonably complex. At the end of the day, the user needs to ask for sane things. I'd be fine with returning a times-like font and ignoring the monospace request in your example.

whereswaldon commented 1 year ago

@benoitkugler I'll see about making the tests pass on windows next.

As for the one helvetica test that fails on my machine, I guess maybe we shouldn't expect the host system to have an arabic font that matches that family installed?

benoitkugler commented 1 year ago

I think it's the user's fault if they request an impossible situation like a monospaced times font. Monospace seems (to me) to be a more specific request than the family. If a layout is requiring it, not respecting that seems likelier to break something than not respecting the family. My intuition is that if we can't find a family match that is monospaced, we should just return an arbitrary monospaced font.

Alright. Could we only support 'monospace' query and not 'monospace and this family' ? If so, this could be implemented by giving the 'monospace' family string the special behavior you specified :

The Query struct would not change : to query a monospace font, you pass Query{Families: []string{"monospace"}}

What do you think ?

benoitkugler commented 1 year ago

As for the one helvetica test that fails on my machine, I guess maybe we shouldn't expect the host system to have an arabic font that matches that family installed?

Hum yes that seems reasonable. We could perhaps lower our expectation to 'serif' ? Or is it still too restrictive ? I'm not sure if there is many systems with no Arabic font at all ?

whereswaldon commented 1 year ago
  • I think it's the user's fault if they request an impossible situation like a monospaced times font. Monospace seems (to me) to be a more specific request than the family. If a layout is requiring it, not respecting that seems likelier to break something than not respecting the family. My intuition is that if we can't find a family match that is monospaced, we should just return an arbitrary monospaced font.

Alright. Could we only support 'monospace' query and not 'monospace and this family' ? If so, this could be implemented by giving the 'monospace' family string the special behavior you specified :

* font would need to pass the IsMonospace test

* if after family substitutions, no font is selected, we default to the first monospace font we have (outside the families)

The Query struct would not change : to query a monospace font, you pass Query{Families: []string{"monospace"}}

What do you think ?

I'm not sure that that's viable. Isn't it fairly normal to use a typeface like Noto and want to use the monospaced version of it when embedding preformatted text? I'd hate for our logic to choose a different monospaced font. Or would it work as expected if you supplied Query{Families:[]string{"noto", "monospace"}}? I would expect it to return either a noto or a monospaced font.

whereswaldon commented 1 year ago

As for the one helvetica test that fails on my machine, I guess maybe we shouldn't expect the host system to have an arabic font that matches that family installed?

Hum yes that seems reasonable. We could perhaps lower our expectation to 'serif' ? Or is it still too restrictive ? I'm not sure if there is many systems with no Arabic font at all ?

I'll own that there are relatively few desktop systems with no arabic font at all, but imagine the case of running tests for a server-side project that wants to do text shaping. The environment may be a bare-minimum container, in which case there may not be any fonts. In general, I'm not sure that it's a good idea to write tests that rely upon the system to have any particular set of fonts installed, or at least not to run those by default. I still can't figure out why this test fails for me, since I actually do have arabic fonts.

whereswaldon commented 1 year ago

Okay, all tests pass on all CI platforms. :tada:

As I mentioned in https://github.com/go-text/render/pull/8#discussion_r1232520950, I would like to alter this code to accept a user-provided logger instead of logging to the default logger unconditionally, but I figured that's a less important concern that getting the code and API correct.

benoitkugler commented 1 year ago

I'll own that there are relatively few desktop systems with no arabic font at all, but imagine the case of running tests for a server-side project that wants to do text shaping. The environment may be a bare-minimum container, in which case there may not be any fonts. In general, I'm not sure that it's a good idea to write tests that rely upon the system to have any particular set of fonts installed, or at least not to run those by default. I still can't figure out why this test fails for me, since I actually do have arabic fonts.

Alright, fair enough !

benoitkugler commented 1 year ago

I'm not sure that that's viable. Isn't it fairly normal to use a typeface like Noto and want to use the monospaced version of it when embedding preformatted text?

Well actually, the CSS spec does not provide a way for it. If you want Noto monospace, you have to explicitly ask for "NotoMono"

I'd hate for our logic to choose a different monospaced font. Or would it work as expected if you supplied Query{Families:[]string{"noto", "monospace"}}? I would expect it to return either a noto or a monospaced font.

You are correct about Query{Families:[]string{"noto", "monospace"}} But, huh, it seems that mixing family and monospace will be more complicated... Also note that (as far as I now), monospace font usually include "mono" in their family name. So that specifying "noto" and "monospace" would not match "noto mono" (at least with the current implementation) ..

I'll need to think more about it. I'm a bit worried that it would complicate the semantics and also go far from other libraries behavior, but well...

whereswaldon commented 1 year ago

I'm not sure that that's viable. Isn't it fairly normal to use a typeface like Noto and want to use the monospaced version of it when embedding preformatted text?

Well actually, the CSS spec does not provide a way for it. If you want Noto monospace, you have to explicitly ask for "NotoMono"

A bizarre oversight. Huh. That's a good datapoint, though it's also an opportunity for us to do better than CSS (if there is a reasonable way to do it).

I'd hate for our logic to choose a different monospaced font. Or would it work as expected if you supplied Query{Families:[]string{"noto", "monospace"}}? I would expect it to return either a noto or a monospaced font.

You are correct about Query{Families:[]string{"noto", "monospace"}} But, huh, it seems that mixing family and monospace will be more complicated... Also note that (as far as I now), monospace font usually include "mono" in their family name. So that specifying "noto" and "monospace" would not match "noto mono" (at least with the current implementation) ..

Hmm. I'd really like for users to be able to say "I want a monospace font" and always get one. I'd also really like for users to be able to say "I want a Noto font" and "I want a monospace font" and end up with Noto Mono. However, it seems like that might be unrealistic given your discussion above?

Assuming we could add "IsMonospace" to the footprint, would it then be possible to do the following:

In this way, it is possible to get a monospace font for a specific family.

I'll need to think more about it. I'm a bit worried that it would complicate the semantics and also go far from other libraries behavior, but well...

Could you elaborate on how other libraries differ? And do you think that the approach that they take is better, or simply more conventional?

whereswaldon commented 1 year ago

Here is my current TODO list for this PR (unordered):

Other things that need doing:

benoitkugler commented 1 year ago

Assuming we could add "IsMonospace" to the footprint, would it then be possible to do the following:

  • Add IsMonospace as a Query field
  • If the query has IsMonospace set to true:

    • When matching against footprints, only consider footprints with IsMonospace
    • If no footprints pass, repeat the match without enforcing IsMonospace

In this way, it is possible to get a monospace font for a specific family.

Yes that seems possible, indeed

Could you elaborate on how other libraries differ? And do you think that the approach that they take is better, or simply more conventional?

Well I only had rust font-kit in mind, which implements CSS rules. I'm hoping go-text will implement a CSS compatible behavior, but your approach seems just fine !

whereswaldon commented 1 year ago

@benoitkugler When you have time, please look at https://github.com/go-text/typesetting/pull/63/commits/31d8574748ff28f29b134817c14106d2ecd41686 and let me know if you have a better approach in mind. I opted to create a second method so that we didn't need to change the shaping.FontMap interface. There are many other ways we could go about offering the metadata though, and I'd be happy to do something else if you have a preference.

whereswaldon commented 1 year ago

It seems we have work to do for WASM graceful fallback. Or, at least, I do:

image

whereswaldon commented 1 year ago

image

It was all on Gio's end. WASM currently doesn't provide any system fonts, but I've made Gio tolerant of having zero faces available so that we can at least present the non-text UI if the user didn't supply any faces and the system doesn't supply any.

benoitkugler commented 1 year ago

@benoitkugler When you have time, please look at 31d8574 and let me know if you have a better approach in mind. I opted to create a second method so that we didn't need to change the shaping.FontMap interface. There are many other ways we could go about offering the metadata though, and I'd be happy to do something else if you have a preference.

My preference would go towards actually updating the shaping.FontMap interface. The current version was rather a draft, and I don't think it is really used in user code. I'm not even sure we need the interface at all, do we ?

That would add a disparity with the more elementary ResolveFace but since the two functions tackle different issues, it does not bother me that much.

What are you thoughs ?

whereswaldon commented 1 year ago

@benoitkugler When you have time, please look at 31d8574 and let me know if you have a better approach in mind. I opted to create a second method so that we didn't need to change the shaping.FontMap interface. There are many other ways we could go about offering the metadata though, and I'd be happy to do something else if you have a preference.

My preference would go towards actually updating the shaping.FontMap interface. The current version was rather a draft, and I don't think it is really used in user code. I'm not even sure we need the interface at all, do we ?

What are you thoughs ?

My reasoning was that the actual functionality of ResolveFace when used by the shaping package doesn't really need the metadata, so it places a burden on implementers of shaping.FontMap for no reason. As for whether we need it at all... I think it's a good idea to keep it as an interface. I can imagine people wanting to use go-text, but to plug in the system font provider library for the process of choosing faces.

That would add a disparity with the more elementary ResolveFace but since the two functions tackle different issues, it does not bother me that much.

I feel bad about this knowing how hard you worked on the simpler ResolveFace, but I don't currently have a use-case for it. Gio is using the FontMap approach and just paying the application startup cost of 200ms or so. The limitations of ResolveFace mean that it can't be used for fallback, which is one of the primary Gio use-cases for system fonts. Do you have a use-case for it? If not, it's possible that we don't need it at all.

benoitkugler commented 1 year ago

I feel bad about this knowing how hard you worked on the simpler ResolveFace, but I don't currently have a use-case for it.

No worries :) Besides, I think we should back-port its logic into Fontmap.ResolveFace, since the decision to look for a new face or not is still relevant in this context.

Do you have a use-case for it? If not, it's possible that we don't need it at all.

I don't think so. I'll want to use the full featured Fontmap. The simpler ResolveFace was more a kind of prototype.

I can imagine people wanting to use go-text, but to plug in the system font provider library for the process of choosing faces.

Sure thing, but for now we never use a Fontmap as input of one of our functions...

Gio is using the FontMap approach and just paying the application startup cost of 200ms or so.

Is this new ? I thought Gio was using the simpler ResolveFace with a fixed slice of fonts setup by the developper.

whereswaldon commented 1 year ago

I feel bad about this knowing how hard you worked on the simpler ResolveFace, but I don't currently have a use-case for it.

No worries :) Besides, I think we should back-port its logic into Fontmap.ResolveFace, since the decision to look for a new face or not is still relevant in this context.

Do you have a use-case for it? If not, it's possible that we don't need it at all.

I don't think so. I'll want to use the full featured Fontmap. The simpler ResolveFace was more a kind of prototype.

I can imagine people wanting to use go-text, but to plug in the system font provider library for the process of choosing faces.

Sure thing, but for now we never use a Fontmap as input of one of our functions...

We use one here, which is the functionality Gio is consuming to load system fonts.

Gio is using the FontMap approach and just paying the application startup cost of 200ms or so.

Is this new ? I thought Gio was using the simpler ResolveFace with a fixed slice of fonts setup by the developper.

This is the approach that I'm prototyping, yeah. ResolveFace doesn't let us resolve by rune coverage, and we're constantly running into "My application keeps showing tofu when a user inputs text in language X" problems. It's clear that we need to be able to fall back to the system-installed fonts to display unexpected multilingual text, and only the fontmap gives us the ability to do that.

There's still a fixed slice of fonts pre-loaded by the developer, but now we use the FontMap to load additional system fonts as-needed.

whereswaldon commented 1 year ago

I've got system font loading working on Android, but it required some tweaks (the commits that I've just pushed).

image

The most frustrating problem was that I couldn't find a safe directory to use to store our cache automatically, so I've officially pushed that responsibility onto the calling application. In theory the caller is a proper Android app with the ability to use JNI to query the proper cache directory for us.

whereswaldon commented 1 year ago
Screenshot 2023-06-20 at 3 52 18 PM

Things are looking good on macOS also, though I think we knew that already.

whereswaldon commented 1 year ago

Okay, I was able to get iOS working as well. I had to update the code to not assume the cache directory already exists, but that makes sense.

image
whereswaldon commented 1 year ago

Windows just works without a hitch:

image
benoitkugler commented 1 year ago

We use one here, which is the functionality Gio is consuming to load system fonts.

Wow, my mistake, I forgot we have also implemented the general approach.

OK, so it is probably best to keep your approach (not changing the Fontmap interface) and bear with the "duplicated" ResolveFace and ResolveFaceAndMetadata.