RazrFalcon / rustybuzz

A complete harfbuzz's shaping algorithm port to Rust
MIT License
544 stars 36 forks source link

Draft implementation of wasm shaper #122

Closed asibahi closed 2 months ago

asibahi commented 3 months ago

Hello.

working on this

So I started a basic structure of the wasm runner and how to export functions to the wasm module. I am using the wasmtime crate.

The general workflow is this :

  1. look for Wasm table. If it is there : run it.
  2. the shape_with exported function should just always direct to shape_with_plan
  3. if Wasm table is not there or is malformed , use shape_with_plan. instead like normal.

I already hit a couple of things I do not understand:

  1. the API has a concept of scale, which rustybuzz (afaik) does not have. Some of the harfbuzz examples depend on scale. What would be the correct strategy to deal with this?
  2. I have absolutely no idea how to deal with pointers. Using wasmtime APIs to "link" functions with pointers is giving me compiler errors. You can see what the compiler points are by uncommenting the marked lines. Something about traits?

This is a draft PR hoping for some feedback and help.

Final edit I swear: The two places where I am looking at the functions and how they're used are :

  1. The Design document
  2. The harfbuzz-wasm crate. (yes that's the entire crate.)

The C++ implementation of the wasm api is here But C++ is black magic to me.

LaurenzV commented 3 months ago

the API has a concept of scale, which rustybuzz (afaik) does not have. Some of the harfbuzz examples depend on scale. What would be the correct strategy to deal with this?

How exactly do they depend on it? We indeed don't have scale, so no scale should be applied (or basically a "scale" of 1).

behdad commented 3 months ago

the API has a concept of scale, which rustybuzz (afaik) does not have. Some of the harfbuzz examples depend on scale. What would be the correct strategy to deal with this?

How exactly do they depend on it? We indeed don't have scale, so no scale should be applied (or basically a "scale" of 1).

Or rather, scale of UPEM in HB terms.

asibahi commented 3 months ago

the API has a concept of scale, which rustybuzz (afaik) does not have. Some of the harfbuzz examples depend on scale. What would be the correct strategy to deal with this?

How exactly do they depend on it? We indeed don't have scale, so no scale should be applied (or basically a "scale" of 1).

I haven't actually looked through the code , but this example uses optical size to render the glyphs.

behdad commented 3 months ago

This is nice, particularly because wasmtime seems to have a "fuel" concept to limit CPU usage:

https://github.com/bytecodealliance/wasmtime/blob/main/examples/fuel.rs

It generally looks like a better runtime than wasm-micro-runtime, which has several problems as a library.

asibahi commented 3 months ago

the API has a concept of scale, which rustybuzz (afaik) does not have. Some of the harfbuzz examples depend on scale. What would be the correct strategy to deal with this?

How exactly do they depend on it? We indeed don't have scale, so no scale should be applied (or basically a "scale" of 1).

Or rather, scale of UPEM in HB terms.

So I should just return the upem and be done with it? This leaves the issue of how to deal with pointers :D

behdad commented 3 months ago

the API has a concept of scale, which rustybuzz (afaik) does not have. Some of the harfbuzz examples depend on scale. What would be the correct strategy to deal with this?

How exactly do they depend on it? We indeed don't have scale, so no scale should be applied (or basically a "scale" of 1).

Or rather, scale of UPEM in HB terms.

So I should just return the upem and be done with it? This leaves the issue of how to deal with pointers :D

Yes. UPEM works I think. No idea re pointers I'm afraid.

RazrFalcon commented 3 months ago

the API has a concept of scale, which rustybuzz (afaik) does not have

Just ignore it.

I have absolutely no idea how to deal with pointers.

Me neither. I never worked with WASM.

asibahi commented 3 months ago

Gave more flesh to a few functions. Left plenty of todo!s tho. I'd appreciate someone looking at them.

For dealing with pointers and wasm memory I got thankfully some pointers (hehe) from @CryZe at the Rust Community Server.

asibahi commented 3 months ago

Hey

Positive things:

  1. (I think) I have a better understanding of how the wasm modules treats memory and pointers. Long story short, memory is a giant array of u8. And pointers allocated from the module are indices on that array.
  2. The file is easier to navigate, generally. The imported functions are now full fledged functions instead of closures written inline.
  3. I replaced most of transmute with bytemuck, as much as I can figure it out. Please tell me if I fucked up something obvious.
  4. I added a test for the Calculator example. It doesn't really work tho. See below.

So I am trying right now to make the Calculator example work. It only imports 5 functions of the set, and exports only the needed function shape.

As copied from the wat file:

(import "env" "buffer_copy_contents" (func (;0;) (type 0)))
(import "env" "buffer_set_contents" (func (;1;) (type 0)))
(import "env" "font_get_glyph" (func (;2;) (type 3)))
(import "env" "font_get_glyph_h_advance" (func (;3;) (type 0)))
(import "env" "debugprint" (func (;4;) (type 2)))

The last three functions are straightforward I guess and all the complexity is in getting the first two working. I added a test and everything. The good news is that the function isn't returning or crashing. The bad news is that it is not .. working. I can't tell right now where the issue is, but it is giving me the same input as output. And I am quite sure where the issue is.

I am thinking this is perhaps due to a misunderstanding, on my part, on how the hb_buffer_t works. My current assumption is that, after initializing UnicodeBuffer, I take the info and pos vectors out and pass them into the module. (that's what I am trying to do in buffer_copy_contents.) Then after everything, take the info and pos arrays produced by the vector and put them back into hb_buffer_t before wrapping it into GlyphBuffer. (That's what I am trying to do in buffer_set_contents. )

Reading the impls of hb_buffer_t, there seems to be some assumptions I am not sure I get. Are info and pos always the same length? (They should be, right?) Why does the buffer return pos as info sometimes?

When I change the clear() functions to the commented out code in line 396, changing nothing else, the output does change. Now I get the same number of glyphs I am getting, but all the data is zeroed out. When I tried replacing the transmutes with some bytemuck, the test crashed somewhere in buffer.rs.

PS. Sorry for the messy commit history. It is kind of a reflection of the thought process :smile:

behdad commented 3 months ago

Reading the impls of hb_buffer_t, there seems to be some assumptions I am not sure I get. Are info and pos always the same length? (They should be, right?)

Correct.

Why does the buffer return pos as info sometimes?

That's an internal optimization, that the pos array is used as a secondary info during the substitution process before positions are used. It's not relevant to the wasm from what I remember.

asibahi commented 3 months ago

Oh okay nvm the previous comment. I found a bug in font_get_glyph where I don't check if uvs equals 0 and convert it to a char anyway, and ttf-paerse's glyph_variation_index returns None when uvs is \0.

tl;dr : Calculator works now. Time to work on the other functions and examples.

RazrFalcon commented 3 months ago

Congrats!

asibahi commented 2 months ago

I tried to implement the outline function as much as I can glean from the code using it, and a simple impl of shape_with Also added a failing test for ArefRuqaa-Was.ttf.

~~The failure is due to a failure within the module, reaching an unreachable state. The same font works in the Wasm version of FontGoggles. ~~

The failure happens after the module calls font_copy_glyph_outline. So it ~~probably definitely has to do with it being given malformed data somehow. Here is the code dealing with it in harfbuzz-wasm. I understand the concept of on-curve and off-curve and that TTF omits on-curve points as they can be deduced from off-curve points and so on. So I am not sure what I am missing.~~

Figured it out. n_contours is actually not the ends of contours like the harfbuzz-wasm crate implies, but the start point of the next contour. The last contour's "end point" is discarded. also Quads and Cubes need more than one point pushed into points. Not sure how the point type maps to on-curve and off-curve but here we are.

However, test is failing for a different reason. I copied the expected result from FontGoggles, and .. it is not the output I am getting. This could be due to internal differences between HB and RB.

Could be a problem with shape_with ..

asibahi commented 2 months ago

Managed to get it to work by calling crate::shape instead of crate::shape_with_plan.

Two weird issues:

  1. The ArefRuqaa-Wasm module is breaking here (I assume panicking at index bounds ...) when the last letter of the sentence is .fina. So when I type أفشوا السلام بينكم as input, it fails. When I type أفشوا السلام بينكم. (note the period), it runs to completion. I tested with both wasmtime and wasmi.
  2. Output is not identical to Wasm FontGoggles. Clusters are not identical, and some x_advances are not. I do not know if this is due to different versions of harfbuzz or some bug elsewhere.
RazrFalcon commented 2 months ago

also Quads and Cubes need more than one point pushed into points.

Quad is 2 points, Cube is 3 points. All of which you can get from OutlineBuilder.

Not sure how the point type maps to on-curve and off-curve but here we are.

This is a glyf-specific thing. CFF doesn't have it.

In general, it seems like CGlyphOutline and the whole Wasm table is extremely harfbuzz-specific, meaning a regular text shaper will not be able to support it. @behdad is this expected?

asibahi commented 2 months ago

I think I got this as far as I can, tbh.

The Ruqaa test (with both inputs, see comments) is failing for reasons I cannot understand.. I will try and get an image out of the working output, as the failure there doesn't seem to me to be because of the wasm runtime itself. But I don't want to add image and ab_glyph as dev-dependencies, so the image thing will be out of crate.

One more thing: Typst uses an interpreting runtime, wasmi instead of wasmtime.

  1. They have very similar API so changing between them is trivial.
  2. Both are actively developed but wasmtime seems .. corporate maintained. (Bytecode alliance project)
  3. wasmtime brings Cranelift and all its baggage around. wasmi is much lighter as a dependency and can be compiled to no_std.

Another thing: Would you be open to changing the API, @behdad ? Especially when it comes to copying in binary data.

RazrFalcon commented 2 months ago

A smaller wasm dependency is for sure preferable.

tschneidereit commented 2 months ago

@asibahi this is very exciting! As a quick note, please don't hesitate to reach out to us in the Bytecode Alliance for any help we might be able to provide with Wasmtime. The Wasmtime maintainers are very responsive on our Zulip.

wasmtime brings Cranelift and all its baggage around. wasmi is much lighter as a dependency and can be compiled to no_std.

Incidentally, we're working on both of those! There's an RFC for adding an interpreter, and recently parts of the runtime gained support for no_std. The latter doesn't yet include Cranelift, but that can be fixed with some elbow grease.

We also have minimal embeddings getting the size of Wasmtime down to about 2MB right now, with a clear path towards the hundreds of kilobytes range if needed.

Having said all this, wasmi is an excellent interpreter, and I'm sure @Robbepop would be excited to help with integrating it, too!

Robbepop commented 2 months ago

Hi, I have just been made aware of this topic by @tschneidereit 's mention of me and haven't read everything (no time to do so rn). I am happy to help. Either solution, Wasmtime or Wasmi, should work fine since both mostly share the same API surface. It could even be possible to use both solutions simultaneously for their different strengths.

asibahi commented 2 months ago

Fixed the bug and now the Ruqaa Font tests works if we do not compare clusters.

That'll teach me to read the standard library's documentation.

asibahi commented 2 months ago

One thing I learned about recently:

Is scale from harfbuzz (the one expcted by this API) related to pixels_per_em? the skrifa crate seems to imply as much. If that's so we can perhaps just return the pixels_per_em if set, and if not, return UPEM?

Also re: shape_with , doesn’t rustybuzz technically use the AAT shaper as well as OT?

RazrFalcon commented 2 months ago

Is scale from harfbuzz (the one expcted by this API) related to pixels_per_em?

Once again, there is no scale in rustybuzz. It was removed during porting. It should always be 1.

Also re: shape_with , doesn’t rustybuzz technically use the AAT shaper as well as OT?

In harfbuzz terminology this is still the ot shaper. See https://github.com/harfbuzz/harfbuzz/blob/main/src/hb-shaper-list.hh

behdad commented 2 months ago

Is scale from harfbuzz (the one expcted by this API) related to pixels_per_em?

Once again, there is no scale in rustybuzz. It was removed during porting. It should always be 1.

It should be always units_per_em. And it's NOT related to pixels_per_em.

asibahi commented 2 months ago

So I think this is done.

The full defined API in the design document is not implemented, but only the ones used by the examples.

Edit: should you merge this could you make it a squish merge 😁

RazrFalcon commented 2 months ago

Thanks! Will be released soon. Will see how it will go.