arnog / mathlive

A web component for easy math input
https://cortexjs.io/mathlive
MIT License
1.56k stars 277 forks source link

Feature request: Simple latex content #293

Closed Khazuar closed 10 months ago

Khazuar commented 4 years ago

Requirements

In our application we need the user to input math-formulas quite a lot, which are then parsed to semantic ASTs and used in further analysis and checking. This process needs to be highly robust and adaptable, and we need to be able to customize it for different users. We need both the ability to enter symbols and entire formulas via the virtual keyboard (mixed with the normal keyboard) and latex-commands for advanced users.

Current approach

Right now, we're using the $latex()-output of the mathfield and parse this using a family of EBNF grammars, which works rather well in general:

I was trying out the MASTON-output initially, but found it too limited.

Problem

In a lot of cases, the mathfields behave a little too "smart" and their contents are not what one would expect:

I see how these optimization increase the overall quality of the latex in general and make it better visually, but it makes it harder to work with it in a parser (or other automations). Users enter a=b and the field converts it to a\mathop{=}b, which looks nicer because of the improved spacing. But the parser needs to understand both a=b and a\mathop{=}b.

It also seemed these improvements are added over time, so with every new version of mathlive we need to check our application rather thoroughly.

Suggestion

It would be great if there was a way to get the content of the mathfield in a simplified format that is optimized for getting parsed. This could be another accessor like $simpleLatex(), an option for $latex() or something entirely different, that lets us work with the content more easily in an automated way. This doesn't need to solve all the problems I was listing above (there's no reason to write \mathbb{R}\mathbb{R} anyways), but it would be nice if this could prove the more stable and reliable interface.

I am also very open to suggestions how we could change our approach from our side.

arnog commented 4 years ago

Have you tried using mf.$text('json')?

Khazuar commented 4 years ago

I gave the json-output a go and found that it didn't parse a lot of things correctly, e.g. a+b\in\mathbb{N} (see the fiddle). This is the same as the MASTON-syntax I tried some time ago, right? In general I need to be able to parse depending on the math-context the user is in and I need to add new and sometimes complicated notations when necessary. Since this is a very special need, it's probably best if I do the parsing myself. I need reliable input for that though. This doesn't need to be latex of course, but I suspect the internal representation of the mathfield is even more complicated ;)

arnog commented 4 years ago

The intent of MathJSON/MASTON is to produce a structure that is both stable (i.e. independent of the rendering) and easy to parse, which seems to be your use case. I would rather fix the problems that exist in MathJSON/MASTON right now (like the one you mention, that's clearly a bug).

Khazuar commented 4 years ago

It's not just easy to parse, it's already parsed and that's my problem with MASTON. I'd rather have something like an array of tokens (["a", "plus", "b", "elementof", "mathbbStart", "N", "mathbbEnd"]) which doesn't try to guess the syntactic structure of the input yet. One case I'd find hard to solve is \frac{d}{dx} a + b:

I doubt this can be solved well by a generic solution. I also wouldn't want to wait for a mathlive update every time I needed to change or add something there.

Khazuar commented 4 years ago

Or maybe something that only represents the syntactic structure of the symbols, without or with only optional meaning:

[
  { text: "a" },
  { superscript: [ "2" ] }
  { text: "+", meaning: "add" },
  { text: "b" },
  { superscript: [ "2" ] },
  { text: "\in", meaning: "elementof" },
  { 
    text: "\mathbb{N}", 
    meaning: "naturalNumbers", 
    innerToken: [
      { text: "N" }
    ]
  }
]

Something that is very generic and versatile, but robust, easy to implement in mathlive and doesn't need to be changed often.

arnog commented 4 years ago

OK, I get it. Fair point about the possible ambiguity of parsing in some cases.

(Although this ambiguity could be resolved with a semantic pass on the MathJSON ouput, i.e. it would return something like "(d / (d x)) (a + b)" and you could then transform it into something else based on the context)

Another approach is what you're currently doing, which is dealing with a Latex string and it might be the best, especially if your users can enter arbitrary Latex.

The structure you suggest would not be much of an improvement over Latex, and would have the same problems you point out at the beginning:

Khazuar commented 4 years ago

Yes, so right now I'm normalizing all the things mathlive does to the input latex first using replacements and a lexer. But it feels like this is something that mathlive could do itself, since it's applying all these changes in the first place. Hence my original idea with the simplified latex output.

arnog commented 4 years ago

I'm going to think about this some more. This is not as easy as it sounds :) By the time the output is requested, the original input is long gone and not easy to recover. I think the best path might be a version of MathJSON that doesn't apply any transformation rules (i.e. that would return just the "tokens").

NSoiffer commented 4 years ago

I like that idea best. It can be very useful to build in heuristics for semantics, but there are times when it will be wrong, particularly for specialized areas. Having a way to get at the expression before the semantics are inferred seems like a good idea.

On Wed, Oct 30, 2019 at 2:41 PM Arno Gourdol notifications@github.com wrote:

I'm going to think about this some more. This is not as easy as it sounds :) By the time the output is requested, the original input is long gone and not easy to recover. I think the best path might be a version of MathJSON that doesn't apply any transformation rules (i.e. that would return just the "tokens").

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/arnog/mathlive/issues/293?email_source=notifications&email_token=AALZM3EWWSM52AL5T75C6UDQRH5QJA5CNFSM4JG4VZTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECV3Q6Q#issuecomment-548124794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALZM3HF37I6CPJPXH66YRLQRH5QJANCNFSM4JG4VZTA .

Joewings-jw commented 2 years ago

Is there a fix for this yet?In my use-case we're parsing the output from the mathfield to a function for analysing and the drawback is that the syntax of the latex is changed.

arnog commented 10 months ago

There are two ways to handle this: