Closed devongovett closed 6 years ago
Thanks @devongovett! I agree. Can we merge this so I can start working and refactoring on top of this?
@devongovett I didn't get why glyphs can be reordered. Can you explain me why?
@devongovett Yes, definitely I need to know better what's that string-index is and what is for 😄
@diegomura fontkit performs a step in the text layout pipeline called "shaping". Latin-based scripts are simple: there is generally a 1:1 match between unicode characters in a string, and glyphs (their visual representations). However, even in those so-called simple scripts, this may not be true for all cases. Consider a character with an accent mark which may map to a single glyph, or a ligature (e.g. ffi). In these cases, characters don't map to a single glyph.
In other complex scripts, this is taken to a whole other level. The exact glyph chosen for a letter may depend on context - e.g. beginning, middle, or end of a word in Arabic. In many Indic scripts, glyphs may be completely reordered based on syllable analysis. The unicode order is called the "logical order" and the glyph order is called the "visual order".
So that's a long way of answering your question about why glyphs can be reordered and why we need a mapping of string index to glyph index and visa versa. This is all done by fontkit already. Check out the code for the indic shaper for example - it does a lot of stuff. 😉
You can read a bit more about complex text layout here: https://en.wikipedia.org/wiki/Complex_text_layout and check out the links at the bottom for even more.
Hi @devongovett !
I've been debugging a bit these changes, trying to find out why it's not working. The problem seems to be in here: glyphString
seems to be a valid instance, but when we call slice
passing by 0
and glyphString.length
for the first lineFragment, the returned glyphString has ''
as value.
I think I could isolate this issue in this snippet:
const g = new GlyphGenerator();
const glyphString = g.generateGlyphs(string); // string being a valid AttributedString
const s1 = glyphString.slice(15, glyphString.length);
const s2 = s1.slice(0, glyphString.length);
s1
has the same _glyphRuns
as the original glyphString
. Shouldn't we split them also? Otherwise s1
would be have glyph info of the string it was sliced of, which does not makes more sense.s1
has 15
as start
value and 55
as _end
(the original glyphString length). Same again: if s
is a brand new glyphString
, shouldn't their start
value be 0
and _end
the string length?s2
is not slicing s1
from 0
as passed by, but from another part of the string, which is wrong and I guess is the cause of the bug on the line I mentioned above.I would really appreciate your guide in here. My first guess would be fixing the slice
operation, but I want to be sure before starting it. I will have some free time these next days to dedicate to this project, so I would also appreciate your response as soon you can, of course 😉
Also, why do AttributedString
has start
and end
?
I find this a bit confusing. Is there any scenario in which we can have an AttributedString
(ex. with value Lorem ipsum
, and the respective runs) but that can start in the index 10
? What that even means?
Sorry for the trivial question, but I'm really trying to figure out what's going on here
@diegomura
Feel free to change the way this all works, but I'll describe how I think I was implementing it. GlyphString
stores a start
and end
offset, and bases all methods off of those rather than slicing the _glyphRuns
. This way we don't need to modify the start/end offsets of all of the runs when we slice. When we slice, we just add the offsets provided to the stored ones. Again, feel free to change all this, but it might be slower to copy all of the runs so you can adjust their offsets on every slice.
AttributedString's slice works the same way - that's why it has start and end.
Thanks for your answer @devongovett ! It makes total sense, even though it makes the debugging a bit hard. Thanks also for letting me contribute on this project in the way I find more comfortable. I'm not sure how I'll keep this, but I'll tag you in each PR I do.
@diegomura Here are my uncommitted changes that weren't on master. Unfortunately, there is something not quite working - will require some debugging.
I believe I was in the process of changing GlyphString and GlyphRun to use proper string indices generated by fontkit. There is a fontkit branch called string-index which returns an array of the correct string indices for each glyph. This is necessary since glyphs can be reordered, and we need a way to map glyph indices back to string indices and visa versa. Also, since it is stored in a lookup table, it should be much faster for textkit to look up.
Also in here, the text renderer calls a new PDFKit API to render glyphs as text in the pdf. That can be found on the textkit branch of PDFKit.
Looks like a few other things are in here too, like changing a bunch of the scaling code to be in the text renderer instead of everywhere else. This might be the cause of some of the breakage.
Obviously there is still a lot of work to do to get all of this working, but I thought I'd put it up so we can get started from this baseline before refactoring the code again.