brucemiller / LaTeXML

LaTeXML: a TeX and LaTeX to XML/HTML/ePub/MathML translator.
http://dlmf.nist.gov/LaTeXML/
Other
919 stars 97 forks source link

Handling Integrals in content markup #322

Open kohlhase opened 12 years ago

kohlhase commented 12 years ago

[Originally Ticket 1646]

I did not get an answer to my e-mail, so here is the ticket.

I experimented with finding the Schauder Theorems in MathWebSearch (and ran the following through the daemon:

$\int_D \left|f(x) g(x)\right| dx \leq R$ 

and got

<m:apply>
  <m:leq/>
  <m:apply>
    <m:apply>
      <m:csymbol cd="ambiguous">subscript</m:csymbol>
      <m:int/>
      <ci>D</ci>
    </m:apply>
    <m:apply>
      <m:apply>
    <m:abs/>
    <m:apply>
      <m:times/>
      <ci>f</ci>
      <ci>x</ci>
      <ci>g</ci>
      <ci>x</ci>
    </m:apply>
      </m:apply>
      <m:ci>d</m:ci>
      <ci>x</ci>
    </m:apply>
  </m:apply>
  <ci>R</ci>
</m:apply> 

(the presentation MathML looked fine) instead of

<m:apply>
  <m:leq/>
  <m:bind>
    <m:apply>
      <m:int/>
      <ci>D</ci>
    </m:apply>
    <m:bvar><ci>x</ci></m:bvar>
    <m:apply>
      <m:abs/>
      <m:apply>
    <m:times/>
    <m:apply>
      <ci>f</ci>
      <ci>x</ci>
    </m:apply>
    <m:apply>
      <ci>g</ci>
      <ci>x</ci>
    </m:apply>
      </m:apply>
    </m:apply>
  </m:bind>
  <ci>R</ci>
</m:apply> 

I can understand that the subscript giving the domain D is not handled correctly, but that the int does not result in a binding operator is disturbing, and similarly, that the bound variable has not been detected. Equally disturbing (but to be expected) is that f(x)g(x) is treated as a four-argument multiplication.

I wonder what Deyan's grammar makes of this example.

brucemiller commented 12 years ago

Some of it is grammar; more of it is semantic analysis, I think. Firstly, the current grammar pretty much only recognizes a reasonable extent for the integral. That is, it has a good guess for where it stops. [Hint: It doesn't stop when it finds the differential! :> ]

In fact, it doesn't really recognize the differential(s) at all, and that is really the crux of it; doesn't know what to bind.

This would be a valuable enhancement: to dig into the integrand and infer something about any differentials and thus the vars to be bound. Again, using a grammar may or may not be helpful.

kohlhase commented 12 years ago

Deyan, what does your grammar do here?

dginev commented 12 years ago

I am on the same page as Bruce when it comes to BIGOPs, since we managed to get synced during my NIST visit.

Jim Pittman's examples of integrals where the "d" binder is at the beginning, or middle, or within some fraction, confirm Bruce's claims above that figuring out the binding interplay should be a semantic analysis step and not a grammatical one, if we are coming from an underspecified perspective.

Alternatively we could claim it is syntax, but then we need specific subgrammars for each and every math subfield.

I am not producing any binders at the moment and neither did my MSc thesis grammar.

brucemiller commented 12 years ago

Rethinking this, I think there are both grammatical and semantic-analysis angles. Firstly, I think the current parsing for recognizing the integrand is about as good as I can come up with. However, while parsing the integrand, some extra roles or rules should be enabled. In particular, "d" (at least) will need to be recognized as possibly (see below) a prefix operator. The expression with diff d will parse differently from a variable d.

Note that "d" might be used as a variable and differential; an author that did that might possibly distinguish them with fonts (upright "d" is more likely a diff), but conversely, authors that don't have that ambiguity seldom take care with which font they use for diff!

After it's parsed, you can find all the things acted on by diffs, except derivatives (see below), and those are the variables to be bound over; sorta workable, maybe?

Of course diffs that are parts of derivatives don't go into the bound vars, and it may be non-trivial in code to sort them out. Moreover, everything I said about "d" can be said about poor-man's derivatives, but of course there's no integral to limit the scope. An exercise for another day? Or some sort of meta-rule; Hey there's a suspicious number of "d"s in here!

kohlhase commented 12 years ago

Replying to comment 3 @dginev:

I am not producing any binders at the moment and neither did my MSc thesis grammar.

but your grammar should, in fact getting binders right is one of the big things a math grammar should do, since binders are extremely conspicuous features of mathematical formulae.

dginev commented 12 years ago

Replying to comment 5 @kohlhase:

Replying to comment 3 @dginev:

I am not producing any binders at the moment and neither did my MSc thesis grammar. but your grammar should, in fact getting binders right is one of the big things a math grammar should do, since binders are extremely conspicuous features of mathematical formulae.

It should, I just noted that it doesn't yet and I have never reached that point. And I consider the "d" turning into a differential prefix binder in some contexts to be a domain-specific phenomenon. And so are the different notations for universals, or the different notations for lambda abstraction in lambda calculus, etc.

Since I never went into supporting specific domains, those things were never prioritized.

kohlhase commented 12 years ago

Replying to comment 6 @dginev:

Replying to comment 5 @kohlhase:

Replying to comment 3 @dginev:

I am not producing any binders at the moment and neither did my MSc thesis grammar. but your grammar should, in fact getting binders right is one of the big things a math grammar should do, since binders are extremely conspicuous features of mathematical formulae.

It should, I just noted that it doesn't yet and I have never reached that point. And I consider the "d" turning into a differential prefix binder in some contexts to be a domain-specific phenomenon. And so are the different notations for universals, or the different notations for lambda abstraction in lambda calculus, etc.

Since I never went into supporting specific domains, those things were never prioritized.

let me paraphrase what I was saying (and disagree with you): Integrals, Sums, Products, Limits, big unions/intersections,... (really all operators supported in content MathML) are so ubiquitous in Math that you cannot call the domain-specific. '''They need to be prioritized in the content grammar.'''

dginev commented 12 years ago

There is no contradiction between ubiquitous and domain-specific. Vertical bar or fences are also ubiquitous, but flexibly get both new syntactic roles and denotations in different domains and in different communities.

What you are claiming is that the big operators are ubiquitous and unambiguous, as they only denote the operation in their original domain (calculus for integrals and limits, set theory for union and intersection, discrete math for products and sums and so on) and are never assigned a new syntactic or semantic interpretation.

That might just be true. And I suspect they are low-hanging fruit to be picked next, once we get into modeling particular domain operators.

dginev commented 11 years ago

This should probably wait until the 0.8 release is out and I am back in grammar land.

brucemiller commented 8 years ago

Ah, here it is; I wanted to tickle this issue as I was thinking about it in the context of #734.

Since we now go overboard with the handy XMDual, the first task would be to extend what the grammar does with SUMOP, BIGOP, INTOP, namely an XMDual with the parsed material as it currently produces going to the presentation side. The new content side would use the scripts found on the operator to generate a notion of binding and domain. Basically there are two cases. sum/prod like BIGOPs where you may expect to find a subscript like i=0 or i\in S etc; in combination with possible superscript this could yield a bound variable and domain. For integral type operators, the scripts would indicate the domain, but you'd still have to hunt the integrand for variables.

That of course still requires some semantic analysis, but we can take the first step in the parser by constructing some sort of "underspecified"

   apply($operator, apply(underspecified:var_and_limits, $subscript,$superscript), $body);
kohlhase commented 8 years ago

I am thrilled to see that this issue is getting some more love and attention. I am not sure what the "first step" gives us and entails. Is there anything I can do to help? Collect examples maybe?

dginev commented 6 years ago

Very timely reply almost 2 years later - examples would definitely provide acceleration to coming up with a solution, especially from real world integrals (maybe some from DLMF itself?)

I expect some further discussion may be taking place soon... In fact I may be the one providing examples :+1:

kohlhase commented 6 years ago

What do you need in terms of examples? only LaTeX?