Closed doy closed 10 years ago
The parser has been converted to use flex, but it still has the problem of splitting escape sequences across reads. I'll need to move the yylex
call into the libuv work queue function directly, and have that be the blocking function rather than the read
.
So that won't actually work, because if you read large chunks at once, flex won't be able to tell if it should keep matching or not (if it receives a string of text, it has no way of knowing if it should just return it or try reading again, which may block), and if you read single bytes at a time, it will break utf8. What we actually need to do here is parse out of a pre-read string like we were before, but don't include fallbacks for things like warning about incomplete escape sequences, and look at yyleng
after yylex
returns to see how much of the string we actually parsed, and if it wasn't the whole string, then push the characters back into a buffer to try parsing next time we have things to read.
This still leaves the issue of utf8 characters being split across reads though - this might be solvable just by having the parser be aware of utf8 and only reading full characters. I'm not sure how cairo (or pango or whatever) will handle being handed a codepoint for a character followed by a codepoint for a combining character in a second pass - not sure if there is actually anything at all we can do about that, though, since the initial code point is a valid printable thing on its own, and there's no way of telling if a combining character will be coming next if it's not already in the buffer.
Codepoints and escape sequences are now handled properly when split across reads. Still not sure what to do about glyph clusters split across reads.
It looks like urxvt just applies the combining character to whatever character is to the left of the cursor, so this will probably be something that I need to wait on #45 for.
Actually, the parser itself is fine at this point - the combining character issue will be handled elsewhere.
Right now, I parse chunks at a time, which will break if an escape sequence is split across chunks, among many other things. I really need to rewrite this to use a char-at-a-time state machine, or something along those lines.