Closed benoitkugler closed 1 year ago
I honestly have never worked with fixed-point representations before, but my understanding is that they are faster? I really don't know how much faster though. If harfbuzz is using floats, I imagine that we can get away with it too.
The general consensus seems to be that fixed point no longer offers significant advantages unless the hardware lacks an FPU.
I was under the assumption it was also for accuracy? (But 6bytes for the fractional part is probably too few).
Perhaps @nigeltao is the person to ask for input (or pointers) about the rationale for using fixed-point types in x/image/font.
Fyne uses a real float (float32
) so that it is more managable for developers using the library - fixed point precision is hard to work with in comparison.
If this library is to be purely internal (i.e. behind toolkit libs) I guess it does not matter.
I agree that fixed is expected to be faster, and text is notoriously slow so this may be worth the optimisations - or may find that it's negligible alongside the actual maths of text measuring...?
I guess we could devise a little benchmark with x/image/font.Drawer.DrawString("...")
and see what gives?
One reason for fixed point is that computing a floating point expression can give different results on different hardware, even for the same lines of Go code. This fact complicates writing "compare to golden output" unit tests:
Another reason is that fixed point computation can still be noticably faster than floating point, with or without SIMD:
It's not Go related, but Dolphin had a recent bug report (that became an interesting story) that came down to FMA (Fused Multiply-Add) and how, in floating point, +0.0
is different from -0.0
.
Fixed point is slightly more difficult to work with, but many of the existing font & text libraries in Go already use it so go-text/shaping#1 used fixed.Int26_6
.
Any performance gains should be considered carefully - this code is likely to be called at least once for measuring, and once for rendering pretty much every frame, and at 60 fps we only have ~16ms to play with.
Thinking of it a bit more, I noticed that Harfbuzz express the positions of the output glyphs in integer coordinates. It makes sense because there are scaled by a user provided scale
parameter, with the following formula :
outPosition = scale * fontUnit / faceUpem
(For instance, the width in font units of a glyph is typically around 500, with faceUpem = 1000
(or 2000).)
As a consequence
faceUpem
as scale
then amounts to express the position in font units. dpi
and a pointSize
(12 for instance), providing pointSize * dpi / 72
as the scale amounts to express the position in pixels.All that to say that we should maybe consider the same approach : express Advance(), Baseline(), Bounds()
as integers (int32
say), as well as Input.Size()
.
This would solve the question of float representation since the bulk of the operations would then actually be performed on (true) integers.
Scaling up and down may be OK for the calculations, but won't that be problematic when rendering the output? As far as I can see it would create text too large that then needs to be scaled back, which will create graphical artefacts.
@andydotxyz I could be wrong here, but I don't think that expressing Advance
, Baseline
, or Bounds
as integers would impact rendering in any way. At the end of the day, all three of those values are referring to pixel coordinates. Advance is how far the text rendering dot advances when displaying this output. I don't think that the dot generally advances by partial pixels. Similarly, Bounds is (for raster toolkits) the dimensions of the output texture in pixels. It can't be partial pixels there either. I also can't envision how a fractional pixel baseline would be useful.
I think that all three of these factors are ultimately about positioning the text on screen, not about rendering it, which is why making them integers is probably safe. That being said, I'm a novice at all of this.
Using the fixed representations has been a huge pain and the source of several errors in go-text/shaping#5, but that's not a great argument for getting rid of it.
I don't think that the dot generally advances by partial pixels.
I am no expert on this, but if it is only ever whole pixels then why does the golang.org/x/image/font use Int26_6 for this and other values?
That package also uses fixed.Rectangle26_6
for bounds, instead of int based rectangles.
I guess they are not using pixel based values, so should we be matching them for a better drop-in replacement instead? The question perhaps is whether shaping is about a font or about a rasterised output. The latter could be pixel based, but the former probably should not. In Fyne the number of pixels a font requires can change over the life of a window if it moves monitor, for example, or if the user changes scale parameters.
golang.org/x/image/font
uses Int26_6
so that it can represent sub-pixel positioning (where the dot can advance by partial pixels).
http://agg.sourceforge.net/antigrain.com/research/font_rasterization/ is one article about SPP. There are undoubtedly others.
@nigeltao Thanks for lending your expertise here! That was an informative read.
However, as @benoitkugler points out above:
I noticed that Harfbuzz express the positions of the output glyphs in integer coordinates.
\<snip>
[if] you need a higher precision, you just give a higher scale factor and you divide back afterwards.
Our text shaper does not emit fractional values for any of these parameters. If a given toolkit wants to invoke things with higher resolution to take advantage of this, it simply needs to scale the ppem appropriately. It seems silly to me for our output format to be in a non-integer unit when we know that the output data itself will always be an integer.
However, as @benoitkugler points out above:
I noticed that Harfbuzz express the positions of the output glyphs in integer coordinates.
If we follow this then the whole of go-text may become harfbuzz specific, at which point creating these abstractions seems unnecessary. I think we need to, as a group, decide if we are building the abstractions so the implementation is a hidden detail, or if we should just depend on the harfbuzz APIs, tie ourselves to that and save a lot of work.
just a drive-by comment: I was considering using go-text
APIs for star-tex
(an attempt at a pure-Go TeX
engine).
IIRC, TeX
(at least when using type1
fonts and DVI output) is using something like Int12_20
for font metrics.
I'm late to the party, so everyone is fully entitled to ignore my opinion.
I'm not sure the early decision to move to a fractional unit system will support a larger vision of go-text/shaping.
If we follow this then the whole of go-text may become harfbuzz specific, at which point creating these abstractions seems unnecessary.
IMHO this mistakes a property of Harfbuzz as an implementation quirk instead of a more fundamental domain challenge. As I wrote today on the Slack channel:
Usually there’s are several layers of domains with typesetting. Shaping and rendering live in different domains/spaces: the outline-font is a creature of the design space, while a UI is in the space of rendering. Both have different views on what ‘precision’ means. Usually the design space operates with a (much) higher precision. That’s reflected by using an integer type, while render space has to deal with fractions (there’s usually more design units than pixels, which makes quantizing necessary: font-hints, rounding, …)
The answer, that go-text/shaping is an abstraction on top of Harfbuzz is valid, but unfortunately does not help. Opting for fractional values still is a move that influences later stages of typesetting pipelines in an unfortunate way.
What currently is taking shape (sorry for the pun) in go-text/shaping probably is correct for UIs: the quickest path along font->shaping->line wrap->render. That's what @nigeltao has demonstrated impressivly clever in the Go x/font section. And what you guys have accomplished is pretty cool.
A more general typesetting pipeline, however, will include a number of additional steps, which are better carried out in design space, i.e. with "infinitesimaly small integer units". TeX uses 1/65.000 of an inch. Fonts in the wild may well choose a design grid of 4000 units. Having to use fractions is an early descent into the harsh realitiy of limited pixel resolution. If it's of any use I can elaborate on this, but after all I'm more or less an idiot when it comes to graphic UIs (I'm more of a CLI and backend guy).
What you say makes sense, we basically have a choice between integer and scaling up and down, or float with the potential "harsh reality", though I don't fully understand what those harsh realities are.
IMHO this mistakes a property of Harfbuzz as an implementation quirk instead of a more fundamental domain challenge.
If taken in that context alone I suppose that is true, but elsewhere we saw that pango and others use float/fractional so that was made me think it was implementation specific. And as noted at the top the Go packages that existed in x/image/font
seemed to be Int26_6
.
If we want to use implementation details in discussion of an abstraction like this we probably need to compare 2 or 3, and as you say the should come from the same domain.
One thing is surely true - we need to be fully int or fully fractional, what I was wanting most to get fixed was the inconsistency in some of the PRs.
@npillmayer Thanks for your input ! Your point about the internals of a full shaping pipeline is interesting. We could probably adopt the following scheme : keep integer representation as long as it is possible, and convert to floats at the highest level of the pipeline, so that UI toolkits can consume it the easiest way. For now, it is somewhat the case, in the sense that we have one internal layer (Harfbuzz) and one exposed layer (Shaper). We should keep your advice in mind when adding more internal layers.
Don't get me wrong: going Int26_6
all the way will certainly work. It may just be more inconvenient to do the layout work that way. As the layout-task for UI is simpler, consistent fractional values all the way may well be the right thing.
Don't get me wrong: going Int26_6 all the way will certainly work. It may just be more inconvenient to do the layout work that way. As the layout-task for UI is simpler, consistent fractional values all the way may well be the right thing.
It would be a shame to prevent go-text from being used in typesetting contexts because of this API decision. I agree with @benoitkugler that perhaps the right thing is to maintain high fidelity until the GUI API boundary, and to potentially provide a different API surface for typesetting applications that want the granular control.
To be able to know what the right way forward is I think we will have to define the areas of responsibility of each repository in this project. We have also discussed whether they should all be merged to one, which I think makes this even more complicated.
Should we clarify the aim of each area and the types of code that will use them? From my perspective this was all about getting better text rendering so I am lost with all of the different layers and which parts of go-text would focus on other use-cases.
Should we clarify the aim of each area and the types of code that will use them? From my perspective this was all about getting better text rendering so I am lost with all of the different layers and which parts of go-text would focus on other use-cases.
Not trying to intervene, but I put out a blog post which may or may not help defining the context of go-text
:
Thanks @npillmayer for the write-up. To borrow terms from there, I think go-text should aim for:
[]Input
where each input is a homogeneous "item" with consistent style. We probably want to provide bidi splitting though, to transform a potentially mixed-direction []Input
into a single-direction-per-element []Input
.Just a few thoughts/questions:
[]Input
that may, or may not, be single-direction runs then it seems we will have to parse the string to check. Would it not be better to specify that either it is or is not inclusive of this processing? Maybe you meant that we should internally handle the bi-di parsing? This is a change of scope from earlier discussions, though not something I am against.Authoring: if we accept an []Input that may, or may not, be single-direction runs then it seems we will have to parse the string to check. Would it not be better to specify that either it is or is not inclusive of this processing? Maybe you meant that we should internally handle the bi-di parsing? This is a change of scope from earlier discussions, though not something I am against.
Yeah, sorry I was unclear. Higher-level code will need to give us styled runs of text, each represented as an Input
. It's up to us whether they're responsible for doing bidi during the creation of those runs or not. I'd tentatively suggest that we offer a function that accepts []Input
that are not guaranteed to be single-direction, and that we apply the bidi algorithm to yield []Input
where every element is single-direction. Toolkits can choose to invoke this helper (or not) as a preprocessing step before using the shaper.
I totally agree that we don't want to add logic to the shaper that tries to verify that each Input
is single-direction.
Line breaking: We have an implementation for this in Fyne, but unfortunately it cannot be contributed because the original authors are not here to donate it to the public domain license.
Does your line breaking algorithm handle RTL text? I think we'll definitely need one that does, which is why I've been working on one. I've been meaning to PR it into here, but I haven't gotten to it yet.
Rasterisation: I don't understand quite why this is application or toolkit dependent - can you explain what you mean here please? I had thought that going from text vectors to pixels is a pretty standard operation - there is a golang.org package that seems to manage it without platform considerations?
This is common, sure. It's just that there's a rats-nest of complexity in sub-pixel hinting, stem widening, gamma correction, and other parameters that applications might want to make different choices on. I'm okay with this being in scope for go-text if everyone sees it as a common need, but it's an area in which it seems difficult to create the "one rasterizer that will serve all usecases". Maybe that's okay though. I don't have strong feelings on this. I suggested that it might be out of scope before in an effort to simplify the scope of the overall project.
Does your line breaking algorithm handle RTL text? I think we'll definitely need one that does, which is why I've been working on one. I've been meaning to PR it into here, but I haven't gotten to it yet.
No, not yet - this go-text project is our exploration into RTL land. We have not made commitments to have it delivered until a future release that is not yet scheduled.
I suggested that it might be out of scope before in an effort to simplify the scope of the overall project.
Given the aim of this project was to create a single place that Go projects could handle text in a graphical context I'm not sure it would be easy to say that rendering the pixels is out of scope. It could be a different repository/package but it really feels in-scope for go-text
I suggested that it might be out of scope before in an effort to simplify the scope of the overall project.
Given the aim of this project was to create a single place that Go projects could handle text in a graphical context I'm not sure it would be easy to say that rendering the pixels is out of scope. It could be a different repository/package but it really feels in-scope for go-text
Several graphical contexts don't need rasterization: SVG/PDF/PS output, web
Let's simply plan to provide a raster
package for that purpose. I don't really know what we'll put in there, but it's definitely fine to offer it.
I think I would prefer that, thanks @whereswaldon. I see that Gio and some applications won't use a rasterizer, but I am pretty sure that many others could want one. Even if Fyne moves to vector fonts we will still want to rasterize in software for unit tests etc :).
I think this issue wandered quite far from the initial question, and I don't know if there are any outstanding todos from this conversation. We have a rendering package now, and we use fixed point values in the API. I'm going to close this for now.
I would like to discuss the choice of the actual representation for floats. I think there is two options :
float32
orfixed.Int26_6
. Thex/image/font/sfnt
package usesfixed.Int26_6
but I'm not sure I understand why. Do we really need a fixed point representation ?From what I've understood, the C libraries Harbuzz and Pango uses regular float (or double), and it seems that Fyne also favors
float32
.It is not a fondamental question, but using
float32
would simplify the implementation of go-text/shaping and go-text/font.