RazrFalcon / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
498 stars 34 forks source link

Replace `unicode-general-category` w/ `unicode-properties`. #71

Closed eddyb closed 11 months ago

eddyb commented 1 year ago

What prompted me to look at unicode-general-category was these build timings:

While ttf-parser prevents rustybuzz from starting its compilation earlier anyway, it still seemed weird that only one of several unicode-* deps was so much more involved than the others. (And within a large build, it is possible that such a serial dependency would have an impact)

The description of unicode-general-category seemed to suggest it was used for performance reasons, and I spent some time optimizing away some low-hanging fruit (it wastes over half a second just on not using buffered IO, but also it generates Rust source code instead of binary blobs, and there's a few more things like that).


But then I noticed something: the yeslogic org has a bunch of unicode-* crates, but most of them are prefixed with yeslogic- on crates.io (since they're alternatives to e.g. unicode-rs crates).

And there's no any other obvious choice of a crate for querying "general category", not on crates.io. Which leads me to believe the use of an yeslogic crate wasn't intentional but rather accidental/incidental (as it looked like the only option, and it didn't have a prefix that would make it seem out of place).


There are two blockers for this PR:

RazrFalcon commented 1 year ago

Hi! I genuinely do not remember why I have chosen unicode-general-category. It was 3 years ago. And yes, rustybuzz has an absurd amount of low-hanging fruits. I simply have no time working on it and it's semi-abandoned. As long as tests are passing - I'm fine with it.

Sadly, there are no alternatives (yes, I know about swash) and I hope someone else would keep maintaining it. Which is unlikely, because the domain is so absurdly complex and niche.

RazrFalcon commented 1 year ago

Which leads me to believe the use of an yeslogic crate wasn't intentional but rather accidental/incidental (as it looked like the only option, and it didn't have a prefix that would make it seem out of place).

No, I knew it was by yeslogic.

Maybe at that time, early 2020, there were no stand-alone Unicode general category crates. And I did not wanted to blow up the binary size by including the whole Unicode table data.

The current rustybuzz status is: thanks God it finally works... It took 8 months to port, so I was fine with any solutions to get it to the finish line.

Manishearth commented 1 year ago

Btw, filed https://github.com/unicode-rs/unicode-properties/issues/1 about the names of the enums

Manishearth commented 11 months ago

It's now published, sorry for the delay

eddyb commented 11 months ago

It's now published, sorry for the delay

@Manishearth no worries, I lost track of this as well, thanks for publishing!


@RazrFalcon should be good to go (and the PR is trivial now, thanks to the renaming).

RazrFalcon commented 11 months ago

Thanks!