Spaces aren't preserved by `preserveLatex`

cortex-js / compute-engine

An engine for symbolic manipulation and numeric evaluation of math formulas expressed with MathJSON

https://cortexjs.io

MIT License

375 stars 47 forks source link

Spaces aren't preserved by `preserveLatex` #41

Closed bengolds closed 2 years ago

bengolds commented 2 years ago

Version 0.44

Steps to reproduce

Set jsonSerializationOptions.metadata = ['latex']
Parse an expression with extra spaces (e.g. parse('1 + x'))

What I'd expect I'd expect the latex field to exactly match the input, including all of the spaces in the original expression; ie: {fn: ["Add", {num: "1", latex: "1"}, "x"], latex: "1 + x"}

What happens instead Multiple contiguous spaces are collapsed into one: {fn: ["Add", {num: "1", latex: "1"}, "x"], latex: "1 + x"}

bengolds commented 2 years ago

(note -- this isn't urgent, since we can work around this by replacing multiple spaces with one)

arnog commented 2 years ago

This is a little tricky...

Before being parsed, the LaTeX string goes through a tokenization phase, during which consecutive spaces are collapsed into one, comments % foo are removed, and multiple lines are merged into one. By the time parsing begins, that info is gone...

Would you expect the comment and new lines to be preserved as well? And if so, what should they be attached to?

Hmmm... actually, there are other primitive commands that need to be handled during the tokenization phase ("the gullet" in TeX parlance), such as \bgroup, \obeyspaces, \relax and a bunch more. These also are not currently round-tripped...

I need to think about this.

bengolds commented 2 years ago

This is definitely a strange one; my expectation would be:

If I parse a string, I'd get back an AST.
The latex value attached to the top level is exactly what was passed to parse, comments, spaces, and al.
The latex value of the component nodes could have comments/spacing/anything else modified or stripped.

bengolds commented 2 years ago

Or, perhaps there's a distinction here between the parser returning "normalized LaTeX" and the "original LaTeX".

arnog commented 2 years ago

ah, ok, then if only the top-level is "verbatim", that is easy...

bengolds commented 2 years ago

That'd be my expectation -- does that seem reasonable to you?

it would have to be a judgement call as to where to assign the spaces/comments for any child nodes anyway.