asciidoctor / asciimath

Asciimath parser
MIT License
24 stars 16 forks source link

Use objects instead of Hashes and Arrays to represent the AST #30

Closed pepijnve closed 4 years ago

pepijnve commented 4 years ago

In order to enable more complex interpretation of the AST it can be helpful to be able to inspect the rest of the tree while generating output for one node. For instance to know if something is a function application or just an identifier it helps to be able to check sibling nodes.

As an example

+ f
+-+ (
  + x
  + ) 

is a the application of function f to value x and it might be desirable to render a ⁡ between f and (.

On the other hand when rendering As an example

+ sin
+-+ (
  + f
  + ) 

f is not a function.

In order to distinguish between the two we need to be able to 'see' that in the first case f is followed by a paren expression while in the second case it's the only element inside a paren expression and is not followed by some other identifier.

My hope is that proper AST objects that provide tree traversal methods will enable this type of more sophisticated reasoning in the code.

pepijnve commented 4 years ago

@GarkGarcia @davidfarmer this is a followup to #25. I would love to hear your input on this.

If we can come up with some rules defining when you treat an identifier as a function and when you treat it as a plain identifier I would love to add that to the renderers.

davidfarmer commented 4 years ago

Well, I saw this coming, which is why (in the issue I closed) I wrote:

I have a dormant project to specify and parse an asciimath-like syntax that retains semantic information. When it comes alive again, I'll let you know.

The problem I was faced with is exactly what you are encountering: how to deduce semantic information. How can you tell that f(x+1) is function application but n (n+1) is multiplication?

AsciiMath specifies that f and g are functions. I do not think that is adequate, and based on when you wrote I think you agree. If you really restrict the areas of math that you are considering, then it can be workable to have f and g always be functions. But that is a very limited use case.

Trying to disambiguate "a times b" from "a cross b", and the various meanings of |a| , and the various meanings of (a,b), led me to question whether there was any hope of staying faithful to asciimath.

That is a long preamble to a simple idea. I claim it is obvious which of the following is function application, and which is implied multiplication (note that the specific letters are irrelevant):

Example 1: j(a + b) Example 2: p (c + d)

Assuming you figure out the puzzle and decide that what I wrote is reasonable and unambiguous, you then have to decide whether you want to expect users of your system to follow that rule.

Without rules like that, your system has to guess at the meaning.

pepijnve commented 4 years ago

The days where I did mathematics on a daily basis are a long time ago, so I’m not sure anymore which convention means what. With the risk of sounding a bit thick I think you mean we could consider making whitespace meaningful during output generation.

Seems perfectly reasonable to me. This is something we could easily make a mode on the output generator. Retaining white space and grouping parentheses in the ast was something I was considering anyway to enable roundtripping the ast back to the exact input text is something I was already planning as well.

davidfarmer commented 4 years ago

Yes, spaces have meaning. If you mean f times the quantity x + h , then you would write "f (x + h)", with a space between the f and the (x+h). If you meant "f of x + h" then you don't put a space. The spaces around the "+" are irrelevant.

We started writing a parser for this, but it was not as simple as the parser you can write from the AsciiMath grammar. (I was not the one who worked on the parser; I have only a superficial understanding of parsers.)

Once you decide spaces have meaning, the expression "dtheta" is not misinterpreted as "dt h eta". If you meant that, you would have put in the spaces. But now we have departed from AsciiMath.

On Mon, 4 May 2020, Pepijn Van Eeckhoudt wrote:

The days where I did mathematics on a daily basis are a long time ago, so I’m not sure anymore which convention means what. With the risk of sounding a bit thick I think you mean we could consider making whitespace meaningful during output generation.

Seems perfectly reasonable to me. This is something we could easily make a mode on the output generator. Retaining white space and grouping parentheses in the ast was something I was considering anyway to enable roundtripping the ast back to the exact input text is something I was already planning as well.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, orunsubscribe.[AABTULHPGQVGSI5GKIYO4S3RP4K4ZA5CNFSM4MY723RKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5 WW2ZLOORPWSZGOEUWF7DA.gif]

GarkGarcia commented 4 years ago

@GarkGarcia @davidfarmer this is a followup to #25. I would love to hear your input on this.

If we can come up with some rules defining when you treat an identifier as a function and when you treat it as a plain identifier I would love to add that to the renderers.

Makes a lot of sense to me. I mean, Ruby objects are mostly glorified Hashs as far as I can tell, but creating classes to represent the AST could make the code a lot more self-documenting.

AsciiMath specifies that f and g are functions. I do not think that is adequate, and based on when you wrote I think you agree. If you really restrict the areas of math that you are considering, then it can be workable to have f and g always be functions. But that is a very limited use case.

I agree, this is pretty limited and arbitrary.

That is a long preamble to a simple idea. I claim it is obvious which of the following is function application, and which is implied multiplication (note that the specific letters are irrelevant):

Example 1:j(a + b) Example 2: p (c + d)

Seems like a quite reasonable compromise to me. We would be deviating from the standard AsciiMath syntax (which I don't mind in this case). On regards to the distinction between function applications and implicit multiplication, I don't think this would impact the LaTeX output, would it?

Yes, spaces have meaning.

I agree, but I don't think we should generalize this idea. In my opinion, we should only make white-space relevant in specific cases (such as the one describe by @davidfarmer).

GarkGarcia commented 4 years ago

If we were to make white-space generally meaningful, we should consider it some more.

pepijnve commented 4 years ago

@GarkGarcia I've introduced the class based AST and updated the html and mathml backends already. That code is easier to read now. I still need to update AST.adoc, but I think it's sufficiently self-explanatory if you look at the code. I added new Group and Infix classes. Those were necessary to be able to roundtrip grouping parentheses and frac x y vs x / y correctly.

GarkGarcia commented 4 years ago

@GarkGarcia I've introduced the class based AST and updated the html and mathml backends already. That code is easier to read now. I still need to update AST.adoc, but I think it's sufficiently self-explanatory if you look at the code. I added new Group and Infix classes. Those were necessary to be able to roundtrip grouping parentheses and frac x y vs x / y correctly.

Great! Were the changes merged to master? I pulled the asciidoctor/asciimath to my local repository, but there weren't any changes. . .

GarkGarcia commented 4 years ago

Ohh, I see. I'll pull obj_ast.

pepijnve commented 4 years ago

Sorry, should have mentioned that. I prefer to try to keep master in a working, releasable state. Anything that's work-in-progress and breaks existing code is on feature branches until it's sufficiently stable to merge.

GarkGarcia commented 4 years ago

I've integrated the new, class-based AST to the LaTeX renderer. As soon as the obj_ast branch is merged to master I'll close this issue.