RazrFalcon / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
544 stars 34 forks source link

Port OpenType Layout. #14

Closed laurmaedje closed 3 years ago

laurmaedje commented 3 years ago

This ports all of GSUB and GPOS and uses ttf_parser's APIs for GDEF.

Notable changes:

Note that a lot of the original GSUB/GPOS code was unused from the beginning (originally used for harfbuzz's OpenType API). I removed all of that in the first commit.

This PR depends on a few changes in ttf_parser. Basically, two helpers to read data at an offset, making the get method on Offsets16 public and DynArray. These changes can be found here. For now, I rewired ttf_parser in rustybuzz's Cargo.toml to that fork. I didn't open a PR there for now because I'm not sure on how you plan to integrate with ttf_parser since the crates.io version doesn't have a public parser module.

I apologize for the PR being this huge :/

RazrFalcon commented 3 years ago

Wow! Just wow! And the timing is perfect, since I've just finished tiny-skia, which means that the next release of resvg can be pure Rust (I will think what to do with AAT).

Harfbuzz did GDEF blocklisting for some fonts. Since I use ttf_parser's APIs for this, I couldn't port that.

You can do this using Face::table_data. Or not?

This PR depends on a few changes in ttf_parser.

I will merge the next branch to the master now and you could send a pull request with your changes.

I apologize for the PR being this huge

No problem. That was expected.

RazrFalcon commented 3 years ago

Everything seems ok.

You are using a few Rust tricks that I'm not usually using (like impl T and Self::new constructors), but this just a matter of preference. I don't mind. Overall, the code is pretty consistent and idiomatic. In the end, Rust is a pretty simple language, imo.

I guess we should decide what parts should be moved to ttf_parser and what should we keep here. Looks like you only need the parser module and most of the stuff from the ggg module is in the ot/common now. Am I right? Do you think there is a point or even a possibility to move GSUB/GPOS completely to ttf_parser, maybe in the future?

PS: how much are you familiar with text rendering/layout? Because you did a pretty serious work on rb and ttf_parser and it's a very specific domain. To put in perspective, the current rb + ttf_parser implementation took me like 6 month. Mainly because I had no idea about shaping at all.

RazrFalcon commented 3 years ago

I've made the parser module public. Let me know if you need any other code from the next branch. If not, just make the pull request with all your changes.

RazrFalcon commented 3 years ago

Also, update readme and changelog.

RazrFalcon commented 3 years ago

Does this mean the the only things left is hb-ot-shape.cc and AAT?

laurmaedje commented 3 years ago

Yes the ggg stuff in now in ot/layout/common. With regards to the directory structure, I'm not sure whether the extra nesting level ot/layout is really helpful or whether we should just move the files in layout directly into ot.

Moving GSUB/GPOS into ttf_parser is hard. The top-level code (that you already had implemented) could of course live in ttf_parser (at least once we can use NormalizedCoordinate in rustybuzz). The parsing code could also be moved, but I don't really see how we could move the code that lives in the Apply/WouldApply traits. That code is very specific to harfbuzz and needs tons of things we can't possible abstract over. So moving the parsing and the data structures would be okay, but that would also mean exposing all of these enums directly, which is not so nice and also probably not so great for compatibility reasons. Or we could design a separate small API for each kind of lookup and hide the enums in structs. But I'm not sure that's really worth it.

What is left now is

To your other question: I also had no idea about shaping not too long ago. But I'm currently trying to develop a typesetting project that I want to run in WASM and this endeavour somehow led me here. And I had a lot of time on my hands for the past couple weeks. Porting the individual GSUB/GPOS tables was pretty frustrating at times, but tying everything together in the last few days all the more rewarding.

laurmaedje commented 3 years ago

I need two other things from the next branch:

RazrFalcon commented 3 years ago

I'm not sure whether the extra nesting level ot/layout is really helpful

In my kerx branch, I have a tables root module with tables parsing. So you can do the same and move the logic itself to the ot directly.

Moving GSUB/GPOS into ttf_parser is hard.

Yeah... That why I was insisting on doing it on the rb side, since I've came to the same conclusion when I was trying to port GSUB/GPOS myself. This is not a priority, just a curiosity. Moreover, I plan to move most of the kern to the rb too. And leave only the basic OT kerning on the ttf_parser side.

What is left now is

Yes, sounds about right. AAT is the only tricky one, but still way simpler than OT.

But it don't really get how the whole kerning thing works with OpenType and AAT.

It should work out of the box in the kerx branch. At least kern is fully finished. Including AAT one.

but tying everything together in the last few days all the more rewarding

Well, I just glad that I don't have to do this anymore :smile:

I need two other things from the next branch

Done.

Offsets16 + OffsetsIter16

Since it's not used by ttf_parser, I think we should implement it in rb directly. It doesn't rely on any ttf_parser internals.

laurmaedje commented 3 years ago

If we put Offsets16 (and DynArray) into rustybuzz, then we should probably have a StreamExt trait so we can read them with normal methods on Stream, right?

And to the directory structure: Would you put the Apply and WouldApply implementations for the GSUB and GPOS tables alongside the parsing code into the tables module or somewhere in ot? I'm not sure whether it's good to split it up - for the same reason its in rustybuzz and not ttf-parser - because it's tightly coupled.

laurmaedje commented 3 years ago

And I'm wondering whether it makes sense to keep GDEF in ttf-parser or whether it should also live in rustybuzz.

RazrFalcon commented 3 years ago

StreamExt

Yep. That's the idea.

And to the directory structure

I would prefer to have parsing and shaping separated. Most of the parsing is already moved not only into a separate module, but into a separate crate. This was one of the main ideas behind this port. If a user doesn't need shaping, it can use just ttf_parser. It's not possible with hb. Basically, the tables module should not import Buffer and shaping code should not import Stream. I've looked at the code and it seems that Apply doesn't contain any low-level parsing code, so I think it should be split.

As for GDEF, it has a very simple query API, so it can stay in ttf_parser. Maybe someone would find it useful.Basically, rb should do only one job - shaping. Everything else should be moved into separate crates.

laurmaedje commented 3 years ago

Okay. Makes sense.

laurmaedje commented 3 years ago

I addressed the feedback. Splitting the parsing into a tables module was a good idea I think, it worked out well!

The only thing that's still not addressed is the GDEF blocklisting. It could probably somehow be done with table_data, but it's a bit complicated because while harfbuzz simply removes the table from its face, I would have to check for blocklisting at each access to a ttf-parser function that uses GDEF. There aren't too many of these calls, but it's not as nice as simply setting the table to None (which I obviously can't since it's private to ttf-parser).

laurmaedje commented 3 years ago

Just for the record, this does not need any changes to ttf-parser anymore, now.

RazrFalcon commented 3 years ago

I guess we can open an issue for GDEF blocklisting, to implement it later.

Is it fine by you if I squash commits?

Overall, you did an incredible work. And I still cannot believe that someone decided to dive into the nightmare called text shaping.

laurmaedje commented 3 years ago

I think there are a few big chunks which would fit together nicely. So you could maybe squash into:

RazrFalcon commented 3 years ago

I've decided to merge it as is. Your commit history is too good to remove it.

RazrFalcon commented 3 years ago

Looks like there is a memory leak. You can test it yourself using: env RUSTFLAGS="-Z sanitizer=address" cargo +nightly test

No idea why CI didn't caught it.

laurmaedje commented 3 years ago

Short question because I want to prevent duplicate work here. Are you currently/soon working on finishing the port? Because I could imagine doing some more work here, but I don't want either of us to waste their time.

RazrFalcon commented 3 years ago

No plans for the next few months.

laurmaedje commented 3 years ago

Okay, then I'll maybe continue.