Give the user more flexibility over font selection

LaurenzV commented 5 months ago

An attempt to tackle https://github.com/RazrFalcon/resvg/issues/710#issuecomment-2042380784. The basic idea is that we call resolve_font and find_font_for_char to tell the user which fonts we want (which the user can then dynamically load in the background) and then we request them in the end by requesting a reference to fontdb and using that.

In the beginning, I wanted to make the whole thing very generic. Meaning that I tried to write the trait in a way that completely abstracts away from fontdb and fontdb::ID, so that the user has full control over the font selection as well as storing, and it's (in principle) completely abstracted away from usvg. However, there were two big problems with this:

It turns into a generics hell really quickly, and I think that you would probably not like it.
I had really big trouble getting it to work with fontdb, based on the fact that (from what I gathered) the only way to get access to the font source is by using the with_face_data callback, which makes it really awkward to incorporate it into a trait because it doesn't seem to be possible to create trait objects if there are generics involved.

So for now, this is the best design I was able to come up with, which is far from ideal, but it should be better than nothing. We just need to document it properly.

So the remaining steps would be:

Decide whether this design is okay.
Once that is done, I will add documentation.
Test whether the API fulfills what is needed in Typst.

cc @laurmaedje

RazrFalcon commented 5 months ago

Thanks. Looks good to me. Yes, I'm not a big fan of generic code. And we would not need it anyway. There is no point in "removing" fontdb.

RazrFalcon commented 5 months ago

Have you tried implementing FontProvider similarly to usvg::ImageHrefResolver? This would reduce the API/code impact. PS: We should probably call it FontResolver as well, for consistency.

LaurenzV commented 4 months ago

Have you tried implementing FontProvider similarly to usvg::ImageHrefResolver? This would reduce the API/code impact. PS: We should probably call it FontResolver as well, for consistency.

I'm not sure if that's possible. Because the user has to store the fontdb somehow, and if we only provide a struct that contains three fields with callback functions and nothing else, the user doesn't really have a way of storing it...

RazrFalcon commented 4 months ago

But the user stores fontdb already. Nothing changes here. And the Options trait would have fontdb as an argument. Basically:

pub type FontResolverFn = Box<dyn Fn(&Font, &fontdb::Database) -> Option<ID> + Send + Sync>;

Wouldn't this work?

LaurenzV commented 4 months ago

So basically, my understanding is that there are two problems to solve, one of which is a must for Typst, the other one is more of a nice-to-have:

1) must-have: We need to be able to populate the fontdb with fonts on-demand, instead of up-front. One reason for this is that when Typst is running in the browser, it would be very impractical to load hundreds of fonts upfront. Instead, we only want to load the fonts that are actually needed. And the only way this can be done is if we can somehow hook into the font-loading process of usvg and implement our own logic.

2) nice-to-have: It would be nice if we could also implement our own custom logic for selecting a font, because then we could use our custom fallback-resolving logic. But from what I understood this is not a very critical point, the first point is much more important.

So my envisioned usage would have been as follows:

struct MyCustomFontProvider {
    fontdb: RefCell<Database>
}

impl FontProvider for MyCustomFontProvider {
    fn find_font(&self, font: &Font) -> Option<ID> {
        // A custom logic for selecting a font, and then loading the font
        // data in a vector (regardless of whether the data comes from locally or a remote server)
        let font: Vec<u8> = my_custom_logic_for_selecting_a_font();
        // Insert the font we selected and fetched into the fontdb, and then return the ID
        let font_id = self.fontdb.borrow_mut().add_font(font);
        Some(font_id)
    }

    fn find_fallback_font(&self, c: char, base_font_id: ID, used_fonts: &[ID]) -> Option<ID> {
        // Basically the same as find_font, but instead for choosing a fallback font.
    }

    fn fontdb(&self) -> &Database {
        // Here we give out a reference to the fontdb so that usvg can load the font data
        // via with_face_data and get all the necessary metrics, etc. Since we only use
        // IDs that have been fetched either via find_font or find_fallback_font, usvg
        // is guaranteed to only access fonts that have been loaded into the database
        &self.fontdb.borrow()
    }
}

However, I just realized that fontdb doesn't seem to return the assigned ID of a newly inserted font... It seems like the only way to get an ID is via the query interface, so I guess this won't work then...

LaurenzV commented 4 months ago

I'm starting to think that perhaps the real solution to the problem is to extend fontdb is to introduce a Source type that allows the user to define how the font is loaded/mapped. Then we probably wouldn't need any changes in usvg at all. Not sure how feasible it is to implement, though.

And I'm also not a 100% sure what is needed from Typst's side, so maybe it's best to wait for what @laurmaedje has to say.

laurmaedje commented 4 months ago

I'm starting to think that perhaps the real solution to the problem is to extend fontdb is to introduce a Source type that allows the user to define how the font is loaded/mapped.

As far as I know, fontdb needs the actual font data to be able to collect its metadata. So lazy loading wouldn't work. Similar to how Typst needs the FontBook.

However, I just realized that fontdb doesn't seem to return the assigned ID of a newly inserted font...

Looks like load_font_source does provide the IDs, though some extra effort might required to find the correct ID in the vector. Typst would use this method anyway as to not clone the bytes, but rather use Source::Binary.

I think the current interface should work for Typst and it looks reasonable to me. The only problem that stands out to me is the fontdb(&self) -> &Database because the way you've written it will probably not compile. The RefCell will return a Ref<Database> that cannot be directly cast into &Database. Solutions I can think of:

a fontdb-like with_database
an associated type that requires Deref<Target = Database>
let the other methods get &mut self, so that no interior mutability is necessary (not sure whether that works for usvg)
move the database out of the resolver and pass it into the resolving functions instead

RazrFalcon commented 4 months ago

However, I just realized that fontdb doesn't seem to return the assigned ID of a newly inserted font...

Because we're adding fonts and fontdb stores faces. So when we add a *.ttc - it would generate multiple IDs. Therefore we still have to call query even if we just added a new font. It would work just fine with the current API.

The real issue with your solution is that fontdb is mutable now, while it is not right now. RefCell would not work, because it's not Sync. We need Arc.

What we probably want is to add Arc<fontdb::Database> to usvg::Options, which would be deep-cloned when needed. Expensive for cases when we load system fonts, but cheap for cases like Typst. Then we can add href-like callbacks, but for fonts.

Will it work? I don't know. We just have to try it out.

To summarize: usvg needs a shared, thread-safe, mutable fontdb. What is the best to do it? I don't know.

RazrFalcon commented 4 months ago

I'm starting to think that perhaps the real solution to the problem is to extend fontdb is to introduce a Source type that allows the user to define how the font is loaded/mapped.

I would rather keep fontdb as a fancy Vec<Face> wrapper. It's already complicated enough.

LaurenzV commented 4 months ago

I would rather keep fontdb as a fancy Vec wrapper. It's already complicated enough.

~~If something like this would be the only addition necessary, would you accept it? https://github.com/RazrFalcon/fontdb/compare/master...LaurenzV:fontdb:custom~~

Nevermind. I will discuss it with @laurmaedje and will get back to this once we have a proper plan.

LaurenzV commented 4 months ago

@RazrFalcon continuing from before. My current approach would have been as follows: I think there are two additions to fontdb that could the make the whole thing easier:

Add a Source type that takes a callback instead of an Arc directly.
Add a way to store the coverage of a font in fontdb.

I've implemented a prototype here (the code for storing the coverage I "stole" from Typst): https://github.com/RazrFalcon/fontdb/compare/master...LaurenzV:fontdb:custom

If we have this, then I think all that would be needed is to change the fontback parser in usvg to use the coverage for finding a fallback font, instead of trying to load each font one after another to find a suitable one: https://github.com/RazrFalcon/resvg/commit/d92ab2067e84427048e35e2ad660e4addb4f69cd

I think this should work, although as I mentioned I haven't tested it in Typst yet. And I'm also not sure about the memory footprint of storing the coverage in that way.

RazrFalcon commented 4 months ago

That's a highly Typst-specific solution. I don't like it. fontdb should not be affected at all.

laurmaedje commented 4 months ago

Functionality

Regarding functionality, I think this PR mostly works as-is, the only problem is that the database(&self) -> &Database method is incompatible with interior mutability (because the Ref can't be returned). I have made a branch based on this PR that instead uses a with_database method. To make it work with dynamic dispatch (no generics available) I had to come up with a bit of a workaround, basically:

Code

```rust pub trait FontProvider { fn find_font(&self, font: &Font) -> Option; fn find_fallback_font(&self, c: char, base_font_id: ID, used_fonts: &[ID]) -> Option; fn with_database(&self, f: &mut dyn FnMut(&Database)); } pub(crate) trait FontProviderExt { fn with_face_data(&self, id: ID, p: P) -> Option where P: FnOnce(&[u8], u32) -> T; ... } impl FontProviderExt for F { fn with_face_data(&self, id: ID, p: P) -> Option where P: FnOnce(&[u8], u32) -> T, { let mut output = None; let mut p = Some(p); self.with_database(&mut |db| { if let Some(p) = p.take() { output = db.with_face_data(id, p); } }); output } ... } ```

Internally, only with_face_data is used and the FontProviderExt is just an implementation detail. The remaining changes on my branch are mostly just indent changing.

Design

The remaining question is whether the design of this PR is satisfactory to you.

But the user stores fontdb already. Nothing changes here. And the Options trait would have fontdb as an argument. Basically:
 pub type FontResolverFn = Box<dyn Fn(&Font, &fontdb::Database) -> Option<ID> + Send + Sync>;
 Wouldn't this work?
I think immutable access to the database doesn't suffice as it needs to be populated. I'm not sure how exactly you're envisioning Arc usage here, but I think a trait object is actually a fairly nice solution and it requires no deep cloning (for reference, here is the implementation in Typst based on my branch).

And then, there is the question of how to handle font fallback, which is also fairly simple in the trait object design.

I would also like to note that something nice about the design in this PR is that dyn FontProvider doesn't need to be 'static (which is actually necessary in Typst). So if some other route was taken, it'd be nice if that could have a lifetime (which the ImageHrefResolver doesn't currently support). This would likely require Options to have a lifetime.

RazrFalcon commented 4 months ago

I think immutable access to the database doesn't suffice as it needs to be populated.

Yes, there should be a mutable reference there. But your trait is immutable as well, no? find_font should be able to insert a font into the database.

laurmaedje commented 4 months ago

I used interior mutability for that, which requires the whole dance with with_database and dynamic dispatch. If usvg can just provide a mutable reference instead that'd be much better. I did briefly attempt that, but it looked like it would required larger changes to usvg, which I shied away from.

The trait methods could perhaps also be multiple Box<dyn Fn(..)> for font lookup and fallback, but it's probably harder to implement for users because the normal lookup and the fallback might want to share some state. And as mentioned above, for Typst to be able to implement it, it would require a lifetime on the Fn trait object.

RazrFalcon commented 4 months ago

I used interior mutability for that, which requires the whole dance with with_database and dynamic dispatch.

I do not understand how it would work. Can you provide an example?

Af for lifetimes, boxes and traits - it's not the important part. The important part is having a shared, mutable fontdb. Which is only possible with Arc. Which already requires some changes to the usvg. I guess we should store it in Options and not pass as an argument. This is how I envision it.

A reminder that embedded SVG fonts are also planned. Meaning that usvg should be able to modify insert/delete fonts from fontdb on it's own.

laurmaedje commented 4 months ago

I do not understand how it would work. Can you provide an example?

See my branch of resvg and my linked implementation in Typst. It's really just a RefCell in the FontProvider implementation. But this was mostly just to make it work with &self. If usvg can pass it mutably, all of that is moot.

The important part is having a shared, mutable fontdb. Which is only possible with Arc.

Do you mean just Arc or something like Arc<Mutex<..>>? Since just Arc wouldn't give mutability. I'm not sure what an Arc<Mutex<..>>-based design gives over just &mut Database, could you elaborate on that? Which different parts of the code need to hold on to it at the same time? Maybe I'm missing something regarding usvg internals.

If usvg would have an &mut Database in its Options and mutably pass it to the resolver, that would work for me. The downside would be that consumers that don't need the mutable aspect couldn't share a Database across multiple threads. But embedded SVG fonts would be a problem for this anyway, I think (especially with an Arc<Mutex<..>>-based design).

Another small missing piece is to let Database::push_face_info return the ID it generated, but it is trivial.

RazrFalcon commented 4 months ago

I genuinely never used RefCell outside Rc, therefore I wasn't familiar with this pattern. But I guess it might work.

Do you mean just Arc or something like Arc<Mutex<..>>? Since just Arc wouldn't give mutability.

Arc<Database> specifically, because each usvg::Tree::from_data must be able to modify it's own copy if needed.

Let's say we're parsing two SVG files from two threads using a single fontdb. If each file has embedded fonts then each usvg parser has to have its own deep-copy of fontdb to perform db modification and query. A shared DB would not work.

To allow this I suggest storing Arc<Database> in usvg::Options which would be deep-cloned when needed. Maybe there is a better solution - I don't know.

Basically, my solution is to treat the provided Database as the "base" database which would be modified by the parser in whatever way it wants. The parser would basically consume the provided db instead of sharing it like right now. And the reason I don't want to have internal db to begin with is that cloning a db with system fonts loaded is way faster then loading fonts in each parser call.

laurmaedje commented 4 months ago

I see now. That makes sense. I think Arc<Database> in Options could work out nicely and the deep-copy case would really only be hit when (a) sharing a db across multiple threads, (b) having embedded fonts in the SVG. And since usvg::Source is cheap to clone, it should not be that expensive. It would nicely account for the case of a simple prepared database, parallel parsing, and also accommodate Typst's case where the options would always start out with an empty database, which is filled via the callbacks.

I assume that then the parser would do Arc::make_mut on the database before calling the font lookup function with (&Font, &mut Database) -> Option<ID>, right?

RazrFalcon commented 4 months ago

@laurmaedje Yep. That's the general idea.

As for make_mut, I'm not sure this exact method would be used, but yes, something like that.

laurmaedje commented 4 months ago

@RazrFalcon I have a working implementation of the Arc-approch on this branch. I implemented it exactly like ImageHrefResolver. I also added lifetimes so that it's possible to implement these providers in non-'static scenarios. I think it works pretty well and actually simplifies things a bit. If this looks like you envisioned, then I would polish it up a bit and make a PR.

Here is small overview of the changes:

The Options now contain an Arc<Database> and a new FontResolver<'a>
The ImageHrefResolver is now ImageHrefResolver<'a> (since it might as well). With the default resolvers, you just get Options<'static>
The resolver functions take &mut Arc<Database>, so they can mutate the database with Arc::make_mut if necessary, but if they don't need to add a font, we don't force the database to be cloned. When polishing up, I would add more docs to explain to API users how they are supposed to deal with this Arc.
For each tree conversion, we have an Arc<Database> in the Cache, which is populated over the course of conversion. It is only cloned if fonts are actually added.
The database from the Cache is moved into the Tree after conversion and users can access it with tree.fontdb() to make sense of the IDs in their Spans.
Sub-SVGs start with a fresh database from the Options. The database in the Options is never mutated (it's &Options after all)
We can remove of all the cfg-ed fontdb arguments everywhere

PS: Usage in Typst looks like this (still rough around the edges).

RazrFalcon commented 4 months ago

Looks really good! Thanks! I will accept this PR.

The database from the Cache is moved into the Tree after conversion and users can access it with tree.fontdb() to make sense of the IDs in their Spans.

Haven't even considered it...

RazrFalcon commented 4 months ago

Deprecated in favor of #769

RazrFalcon / resvg

Give the user more flexibility over font selection #754

Functionality

Design