foliojs / textkit

Text layout framework
41 stars 10 forks source link

Devon's uncommitted changes #5

Closed devongovett closed 6 years ago

devongovett commented 6 years ago

@diegomura Here are my uncommitted changes that weren't on master. Unfortunately, there is something not quite working - will require some debugging.

I believe I was in the process of changing GlyphString and GlyphRun to use proper string indices generated by fontkit. There is a fontkit branch called string-index which returns an array of the correct string indices for each glyph. This is necessary since glyphs can be reordered, and we need a way to map glyph indices back to string indices and visa versa. Also, since it is stored in a lookup table, it should be much faster for textkit to look up.

Also in here, the text renderer calls a new PDFKit API to render glyphs as text in the pdf. That can be found on the textkit branch of PDFKit.

Looks like a few other things are in here too, like changing a bunch of the scaling code to be in the text renderer instead of everywhere else. This might be the cause of some of the breakage.

Obviously there is still a lot of work to do to get all of this working, but I thought I'd put it up so we can get started from this baseline before refactoring the code again.

diegomura commented 6 years ago

Thanks @devongovett! I agree. Can we merge this so I can start working and refactoring on top of this?

diegomura commented 6 years ago

@devongovett I didn't get why glyphs can be reordered. Can you explain me why?

diegomura commented 6 years ago

@devongovett Yes, definitely I need to know better what's that string-index is and what is for 😄

devongovett commented 6 years ago

@diegomura fontkit performs a step in the text layout pipeline called "shaping". Latin-based scripts are simple: there is generally a 1:1 match between unicode characters in a string, and glyphs (their visual representations). However, even in those so-called simple scripts, this may not be true for all cases. Consider a character with an accent mark which may map to a single glyph, or a ligature (e.g. ffi). In these cases, characters don't map to a single glyph.

In other complex scripts, this is taken to a whole other level. The exact glyph chosen for a letter may depend on context - e.g. beginning, middle, or end of a word in Arabic. In many Indic scripts, glyphs may be completely reordered based on syllable analysis. The unicode order is called the "logical order" and the glyph order is called the "visual order".

So that's a long way of answering your question about why glyphs can be reordered and why we need a mapping of string index to glyph index and visa versa. This is all done by fontkit already. Check out the code for the indic shaper for example - it does a lot of stuff. 😉

You can read a bit more about complex text layout here: https://en.wikipedia.org/wiki/Complex_text_layout and check out the links at the bottom for even more.

diegomura commented 6 years ago

Hi @devongovett !

I've been debugging a bit these changes, trying to find out why it's not working. The problem seems to be in here: glyphString seems to be a valid instance, but when we call slice passing by 0 and glyphString.length for the first lineFragment, the returned glyphString has '' as value.

I think I could isolate this issue in this snippet:

const g = new GlyphGenerator();
const glyphString = g.generateGlyphs(string); // string being a valid AttributedString

const s1 = glyphString.slice(15, glyphString.length);
const s2 = s1.slice(0, glyphString.length);

I would really appreciate your guide in here. My first guess would be fixing the slice operation, but I want to be sure before starting it. I will have some free time these next days to dedicate to this project, so I would also appreciate your response as soon you can, of course 😉

diegomura commented 6 years ago

Also, why do AttributedString has start and end? I find this a bit confusing. Is there any scenario in which we can have an AttributedString (ex. with value Lorem ipsum, and the respective runs) but that can start in the index 10? What that even means?

Sorry for the trivial question, but I'm really trying to figure out what's going on here

devongovett commented 6 years ago

@diegomura

Feel free to change the way this all works, but I'll describe how I think I was implementing it. GlyphString stores a start and end offset, and bases all methods off of those rather than slicing the _glyphRuns. This way we don't need to modify the start/end offsets of all of the runs when we slice. When we slice, we just add the offsets provided to the stored ones. Again, feel free to change all this, but it might be slower to copy all of the runs so you can adjust their offsets on every slice.

AttributedString's slice works the same way - that's why it has start and end.

diegomura commented 6 years ago

Thanks for your answer @devongovett ! It makes total sense, even though it makes the debugging a bit hard. Thanks also for letting me contribute on this project in the way I find more comfortable. I'm not sure how I'll keep this, but I'll tag you in each PR I do.