Closed bengolds closed 2 years ago
At first glance, what happens is that the parser encounters these characters that do not map to any known dictionary definition (i.e. the parser don't know if they're supposed to be operators or something else), and indeed stop parsing. It should signal that the parsing is incomplete, and if it doesn't that is indeed a bug.
However, the input string is interpreted as valid LaTeX, and some characters may have unexpected results. For example, %
is the "start of comment" character, i.e. anything after this character is ignored. Some other characters that have special meaning for LaTeX include {
}
[
]
$
and \
, so if this input is coming directly from a user (as a variable name, for example), it might be worthwhile to do a first sanitizing pass before calling parse()
.
How should I expect to get the signal that parsing is incomplete? As part of the return value, or as a separate signal?
Right now, when a syntax-error
error is returned, it (should) contain the portion that was not parsed. The current implementation is deficient, however. I will improve it as part of addressing this issue. My plan is that if the parsing runs into an unexpected operator, it would return something like this, assuming the input is x@y
:
["Error", "x", "syntax-error", ["LatexForm", "@y"]]
The LatexForm
expression indicate the fragment of LaTeX that could not be parsed.
The first argument of Error
("x"
) is the part that could be parsed (it could also be a substitute value, depending on the severity of the failure).
When evaluated, the Error
function returns this first argument. So the end result of evaluating this whole expression would be x
, consistent with the "maximum effort" doctrine, but still preserving the information that an error did occur.
Note that you can have more than one Error
expression, depending on how succesful the parsing recovery was (i.e. if it recovers, it can fail again later). For example: \frac{x@y}{a@b}
:
["Divide",
["Error", "x", "syntax-error", ["LatexForm", "@y"]],
["Error", "a", "syntax-error", ["LatexForm", "@b"]]
]
If you want to get rid of the errors, and just have a "cleaned up" expression, you simply evaluate it: expr.evaluate()
-> ["Divide", "x", "a"]
. If you serialize it to LaTeX, without evaluating it first, the LaTeX will highlight the error:
\frac {a \texttt{\textcolor{red}{@y}} } {b \texttt{\textcolor{red}{@b}} }
[edited to clarify that the first argument of Error
would be the portion of the parsing that was succesful]
Sure enough! I think this can probably be closed, and if there are other deficiencies we (speaking for @bengolds here), we can open a new ticket 🙏
> c.parse("x@y").json
[ 'Error', 'x', "'syntax-error'", [ 'LatexForm', "'@y'" ] ]
> c.parse("x&y").json
[ 'Error', 'x', "'syntax-error'", [ 'LatexForm', "'&y'" ] ]
> c.parse("x$y").json
[ 'Error', 'x', "'syntax-error'", [ 'LatexForm', "'$y'" ] ]
> c.parse("x?y").json
[ 'Error', 'x', "'syntax-error'", [ 'LatexForm', "'?y'" ] ]
When encountering most special characters, the parser acts as if it hasn't run into any problem at all:
It'd be great if the parser returned an error like, "Unrecognized symbol: ?" that we could work with to display a better error to the user.