Closed Khazuar closed 10 months ago
Have you tried using mf.$text('json')
?
I gave the json-output a go and found that it didn't parse a lot of things correctly, e.g. a+b\in\mathbb{N}
(see the fiddle). This is the same as the MASTON-syntax I tried some time ago, right?
In general I need to be able to parse depending on the math-context the user is in and I need to add new and sometimes complicated notations when necessary. Since this is a very special need, it's probably best if I do the parsing myself. I need reliable input for that though. This doesn't need to be latex of course, but I suspect the internal representation of the mathfield is even more complicated ;)
The intent of MathJSON/MASTON is to produce a structure that is both stable (i.e. independent of the rendering) and easy to parse, which seems to be your use case. I would rather fix the problems that exist in MathJSON/MASTON right now (like the one you mention, that's clearly a bug).
It's not just easy to parse, it's already parsed and that's my problem with MASTON. I'd rather have something like an array of tokens (["a", "plus", "b", "elementof", "mathbbStart", "N", "mathbbEnd"]
) which doesn't try to guess the syntactic structure of the input yet. One case I'd find hard to solve is \frac{d}{dx} a + b
:
d
is a known variable, this is "((d / (d x)) a) + b"I doubt this can be solved well by a generic solution. I also wouldn't want to wait for a mathlive update every time I needed to change or add something there.
Or maybe something that only represents the syntactic structure of the symbols, without or with only optional meaning:
[
{ text: "a" },
{ superscript: [ "2" ] }
{ text: "+", meaning: "add" },
{ text: "b" },
{ superscript: [ "2" ] },
{ text: "\in", meaning: "elementof" },
{
text: "\mathbb{N}",
meaning: "naturalNumbers",
innerToken: [
{ text: "N" }
]
}
]
Something that is very generic and versatile, but robust, easy to implement in mathlive and doesn't need to be changed often.
OK, I get it. Fair point about the possible ambiguity of parsing in some cases.
(Although this ambiguity could be resolved with a semantic pass on the MathJSON ouput, i.e. it would return something like "(d / (d x)) (a + b)" and you could then transform it into something else based on the context)
Another approach is what you're currently doing, which is dealing with a Latex string and it might be the best, especially if your users can enter arbitrary Latex.
The structure you suggest would not be much of an improvement over Latex, and would have the same problems you point out at the beginning:
Yes, so right now I'm normalizing all the things mathlive does to the input latex first using replacements and a lexer. But it feels like this is something that mathlive could do itself, since it's applying all these changes in the first place. Hence my original idea with the simplified latex output.
I'm going to think about this some more. This is not as easy as it sounds :) By the time the output is requested, the original input is long gone and not easy to recover. I think the best path might be a version of MathJSON that doesn't apply any transformation rules (i.e. that would return just the "tokens").
I like that idea best. It can be very useful to build in heuristics for semantics, but there are times when it will be wrong, particularly for specialized areas. Having a way to get at the expression before the semantics are inferred seems like a good idea.
On Wed, Oct 30, 2019 at 2:41 PM Arno Gourdol notifications@github.com wrote:
I'm going to think about this some more. This is not as easy as it sounds :) By the time the output is requested, the original input is long gone and not easy to recover. I think the best path might be a version of MathJSON that doesn't apply any transformation rules (i.e. that would return just the "tokens").
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/arnog/mathlive/issues/293?email_source=notifications&email_token=AALZM3EWWSM52AL5T75C6UDQRH5QJA5CNFSM4JG4VZTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECV3Q6Q#issuecomment-548124794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALZM3HF37I6CPJPXH66YRLQRH5QJANCNFSM4JG4VZTA .
Is there a fix for this yet?In my use-case we're parsing the output from the mathfield to a function for analysing and the drawback is that the syntax of the latex is changed.
There are two ways to handle this:
mf.latex
will return the "verbatim" latex, i.e. exactly as it was entered. However, if the content is changed via editing operations, there is no verbatim context and the result will be a serialized version of the content.ce.parse(latex, {canonical: false})
for this.
Requirements
In our application we need the user to input math-formulas quite a lot, which are then parsed to semantic ASTs and used in further analysis and checking. This process needs to be highly robust and adaptable, and we need to be able to customize it for different users. We need both the ability to enter symbols and entire formulas via the virtual keyboard (mixed with the normal keyboard) and latex-commands for advanced users.
Current approach
Right now, we're using the
$latex()
-output of the mathfield and parse this using a family of EBNF grammars, which works rather well in general:I was trying out the MASTON-output initially, but found it too limited.
Problem
In a lot of cases, the mathfields behave a little too "smart" and their contents are not what one would expect:
\mathop
or\mathrm
a_b^c
becomesa^c_b
. That can be compensated for in the parser, but it is annoying and this order feels less "semantic".\mathbb{R}\mathbb{R}
becomes\mathbb{RR}
.I see how these optimization increase the overall quality of the latex in general and make it better visually, but it makes it harder to work with it in a parser (or other automations). Users enter
a=b
and the field converts it toa\mathop{=}b
, which looks nicer because of the improved spacing. But the parser needs to understand botha=b
anda\mathop{=}b
.It also seemed these improvements are added over time, so with every new version of mathlive we need to check our application rather thoroughly.
Suggestion
It would be great if there was a way to get the content of the mathfield in a simplified format that is optimized for getting parsed. This could be another accessor like
$simpleLatex()
, an option for$latex()
or something entirely different, that lets us work with the content more easily in an automated way. This doesn't need to solve all the problems I was listing above (there's no reason to write\mathbb{R}\mathbb{R}
anyways), but it would be nice if this could prove the more stable and reliable interface.I am also very open to suggestions how we could change our approach from our side.