First, I'm not sure we're going to allow them. While they might be useful, they can really screw up correct line breaking, and we don't have any other manual line breaking primitives.
Even if we do decide to support them, we should use instead a visible sequence (such as \nbspace). It's really annoying to depend on invisible characters. (although we could still support UTF-8 encoded NB spaces for consistency)
Either way, let's refuse them in the Lexer for the time being.
I've been removing all current cases by hand. Still, it's really annoying to check and handle these spaces every time because of their invisible nature.
For reference, I've been removing them with
sed -i -e 's/\xc2\xa0/ /g' guide/**/*.{src,manu,txt}
and then manually diffing and checking the results.
However, if I remember correctly, there are other UTF-8 sequences for these spaces (or similar invisible yet special characters)... We really should treat these cases in the Lexer.
First, I'm not sure we're going to allow them. While they might be useful, they can really screw up correct line breaking, and we don't have any other manual line breaking primitives.
Even if we do decide to support them, we should use instead a visible sequence (such as
\nbspace
). It's really annoying to depend on invisible characters. (although we could still support UTF-8 encoded NB spaces for consistency)Either way, let's refuse them in the Lexer for the time being.
I've been removing all current cases by hand. Still, it's really annoying to check and handle these spaces every time because of their invisible nature.
For reference, I've been removing them with
and then manually diffing and checking the results.
However, if I remember correctly, there are other UTF-8 sequences for these spaces (or similar invisible yet special characters)... We really should treat these cases in the Lexer.