RazrFalcon / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
498 stars 34 forks source link

examples: add `shaped-text2svg` for generating SVGs from shaped Unicode text. #70

Open eddyb opened 1 year ago

eddyb commented 1 year ago

This is based on the ttf-parser example font2svg, and uses a combination of unicode-bidi and rustybuzz on top of it, to offer a relatively compact but (hopefully) complete usage example for rustybuzz.

While discussing such a self-contained "complete example" with @manishearth, he mentioned that it may be possible for rustybuzz to offer a "complete Unicode bidirectional shaping solution", to avoid having the user correctly use unicode-bidi etc.


Using Go Noto Universal 6.0's GoNotoCurrent.ttf, and the UDHR, I was able to get some examples: (all images are chosen samples, with links above them for the full original version, due to GitHub limitations) Lang shaped-text2svg output diff w/ browser rendering
eng full SVG
full HTML

(--- are misaligned - all languages hit this)
arb full SVG
full HTML

(Latin glyphs appear to misalign Arabic ones)
hin full SVG
full HTML

(no idea what's going on here, more investigation needed)
cmn_hans full SVG
full HTML

((III) confirmed to shape differently in browser vs rustybuzz)

A few notes about that that diff in the last column:

TODO: try more languages, maybe emoji (hard to mix emoji & non-emoji w/o font fallback), try to improve diffing against browser rendering

eddyb commented 1 year ago

Update: I've narrowed down most of the weird differences caused by ASCII to locl - some differences go away if I do font-feature-settings: "locl" 0; in the browser and likewise disabling locl in rustybuzz.

Another way to control this is with the lang property in the browser, if I do document.body.lang = "zh" on the cmn_hans example, all the differences in the bulk of the text go away, and new differences appear in the English header at the top.

At this point I would have to port this example to use harfbuzz to be able to tell, but I suspect the default of leaving the language unset is simply different from what browsers do (which may be using additional heuristics?).

EDIT: given that I see no changes when I force en on either side, I think that's quite literally the default (or equivalent to it in whatever OpenType terms) and there's a behavior mismatch within it, without browsers doing anything more sophisticated.

RazrFalcon commented 1 year ago

Oh wow, thanks! Wasn't expecting someone to dive into this. I was planning to write something like this myself, by didn't had time.

I'm not sure we need full browser compatibility in this demo/example. Even resvg has a far simpler implementation. And it's the reason rustybuzz exists.

As for language and bidi - harfbuzz/rustybuzz are pretty low-level libraries. You cannot use them directly. You do need a text layout library on top of them. Like pango on Linux.

Honestly, I'm not even sure we need bidi in this example. Either way, it's good enough for me already. And you want to improve it a bit - I do not mind. But we should not try implementing a text layout library in a simple example.

Manishearth commented 1 year ago

I would recommend having bidi in the example because it's a useful illustration of all the parts needed to handle text right, and prevents people from using the library naively.

(And because bidi is weird and complicated and the integration of a bidi algorithm implementation with a shaping engine is not necessarily immediately obvious)

RazrFalcon commented 1 year ago

@Manishearth Depending on you definition of a text layout, one can have thousands lines of code on top of rustybuzz. Sure, I don't really mind having bidi in this example, but it's still pretty far from a proper text layout.

I do have plans on writing an easy to use text layout/rustybuzz wrapper eventually, but time is not on my side.

prevents people from using the library naively

Meanwhile I keep telling people to stop using rustybuzz... In a sense that it must not be used directly. You do need a higher level wrapper for it.