asciidoctor / asciimath

Asciimath parser
MIT License
24 stars 16 forks source link

Create interfaces for extentions and custumazation #46

Closed GarkGarcia closed 4 years ago

GarkGarcia commented 4 years ago

The changes we've been working on recently should make it pretty easy to make the parser and the renders extendable. The code is already there, we just need to create some facilities for users to use it and document it.

My proposal is the following:

  1. Make SymbolsTable render-agnostic. The idea is that SymbolsTable would have a column for MathML, a column for LaTeX and a column for HTML, as well as a column for the AsciiMath expression and a column for the Ruby symbol that represents it:

    AsciiMath Symbol MathML HTML LaTeX
    aleph :aleph \aleph
    ... ... ... ... ...

    If a cell is left empty, the renderers would use a (render-specific) default strategy to render the symbol. This would allow users to create extensions by using custom symbol tables.

    Want the parser and the renders to handle a custom symbol of yours? Simply create a row for it your symbols table.

  2. Create an optional parameter for AsciiMath.parse that represents the symbols table that should be used by the parser. It's default value should be the symbols table currently used by the parser.

  3. Create an optional parameter that represents the ColorTable that should be used by the parser. It's default value should be the HTML standard color names.

  4. Create a optional parameter in MarkupBuilder.initialize that represents which SymbolsTable should be used when rendering the markup. It's default value should be the default SymbolsTable used by each renderer.

  5. Create an optional parameter in MarkupBuilder.initialize that represents a map between RGB values and color names.

@pepijnve What do you think?

davidfarmer commented 4 years ago

Consider \int x^2 dx .

That is not actually how most people write it, because there is not enough space between the x^2 and the dx. So it is common for the LaTeX source to be written \int x^2 \, dx .

But, some people think the "d" in "dx" should be upright, not italics. Some people make a "\d" macro for that.

Is it the decision of the author, or the viewer (or renderer) how to typeset the dx in an integral?

That is, if your table outputs it one way, but someone wants it the other way, are you planning to accommodate that?

pepijnve commented 4 years ago

I think we did most of the above already @GarkGarcia. MathML and HTML are effectively the same thing so for all practical purposes we can treat them as such. The HTML backend is kind of experiment someone contributed that isn't 100% complete so I wouldn't emphasise that one too much just yet.

  1. The table in AST.adoc is actually a 'join' of the default parser symbol table, the default MathML symbol table and the LaTeX symbol table. I did it that way to make it as easy as possible to consult it as a reference.
  2. (and 3.) See https://github.com/asciidoctor/asciimath/blob/master/lib/asciimath/parser.rb#L769 ::AsciiMath.parse takes optional parser and color tables.
  3. I built this in already, but the parameter wiring is missing. See https://github.com/asciidoctor/asciimath/blob/master/lib/asciimath/mathml.rb#L8
  4. Not sure how useful that would be. The MathML and HTML backend just render hex RGB values which is fine. Color names might be more elegant when you look at the source, but I doubt anyone really cares.
pepijnve commented 4 years ago

That is, if your table outputs it one way, but someone wants it the other way, are you planning to accommodate that?

Both I think. The idea is to make the tables configurable. Tweaking the parser table is tricky since the semantics change, but tweaking the rendering table should be easy.

The parser symbol table determines how the parser interprets the asciimath input. If you put 'dx' => :dx in there, you'll get a :dx symbol in the AST. If you do not have that entry you'll get d and x identifiers instead. Just 'd' => :d will get you symbol :d and identifier x.

On the rendering side the symbol table determines what you actually output. As a silly example :dx => '\mymacro' would get you \mymacro in your output.

I've intentionally left the symbol table configuration bits out of this gem. Some people might want to load that from CSV, YAML, JSON, ...; others might want to just hardcode it in Ruby code. I don't think a little parser library like this one should impose one choice.

GarkGarcia commented 4 years ago

I think we did most of the above already @GarkGarcia

Great! I guess we're only missing the parameter wiring then? Also, the LaTeX renderer does not use a SymbolsTable.

It wouldn't be that hard of change to implement, but I haven't figured out what the second argument of SymbolsTable.add is supposed to represent. It looks like it's something essential to the parser, but I don't understand why it is necessary for the renderers.

  1. Not sure how useful that would be. The MathML and HTML backend just render hex RGB values which is fine. Color names might be more elegant when you look at the source, but I doubt anyone really cares.

Fair enough. As you know, I've been working hard so that the LaTeX renderer produces the most idiomatic and readable code. I still believe this is a relevant issue, but we could fix it later.

GarkGarcia commented 4 years ago

I've intentionally left the symbol table configuration bits out of this gem. Some people might want to load that from CSV, YAML, JSON, ...; others might want to just hardcode it in Ruby code. I don't think a little parser library like this one should impose one choice.

I agree, it's better to keep things as simple as possible in here. I see an opportunity to create a CLI utility to handle that kind of think. It could be very useful for command-line scripting.

The idea is to create a richer client for the library. The library's command-line interface is very useful for debugging, but it's a bit limited overall.

pepijnve commented 4 years ago

I haven't figured out what the second argument of SymbolsTable.add

SymbolTableBuilder is a little utility class that helps build a frozen Hash. The basic signature for add is add(*keys, value, type). For each value in the keys Array a Hash entry will be created with value {:value => value, :type => type}. The precise semantics of the keys, values and types are not specified, that depends concrete the usage.

For the parser table you have entries like

b.add('ii', :italic, :unary)
b.add('->>', 'twoheadrightarrow', :twoheadrightarrow, :symbol)

which results in

{
  'ii' => {:value => :italic, :type => :unary},
  '->>' => {:value => :twoheadrightarrow, :type => :symbol},
  'twoheadrightarrow' => {:value => :twoheadrightarrow, :type => :symbol},
}

The hash keys are the strings the tokeniser recognises. The :type informs the parser about what type of node it should create. Is the thing I just parsed a symbol, a unary operator, ... The :value is the value that gets stored in the AST node.

The MathML renderer creates its table like this

b.add(:dx, 'dx', :identifier)
b.add(:and, 'and', :text)
b.add(:minus, "\u2212", :operator)

resulting in

{
  :dx => {:value => 'dx, :type => :identifier},
  :and => {:value => 'and', :type => :text},
  :minus => {:value => "\u2212", :type => : operator},
}

Here hash keys are the Symbol values corresponding to the :values from the parser table. The :value is the text that's going to be written in the output. :type determines that MathML tag (or HTML CSS class) that get's used.

In the end you don't have to use this SymbolTableBuilder, the parser and MathML backend just expect something Hash like where the values are Hashes with a certain set of keys. SymbolTableBuilder just makes it easier to create that Hash.

GarkGarcia commented 4 years ago

In the end you don't have to use this SymbolTableBuilder, the parser and MathML backend just expect something Hash like where the values are Hashes with a certain set of keys. SymbolTableBuilder just makes it easier to create that Hash.

Ohh, I see. Makes sense.

I made LatexBuilder::SYMBOL_TABLE public and created an additional (optional) parameter in LatexBuilder::initialize so that users can pass custom symbol tables to the renderer. @pepijnve Could you take a look at #47?

GarkGarcia commented 4 years ago

I also renamed LatexBuilder::SYMBOLS_TABLE and MarkupBuilder::DEFAULT_DISPLAY_SYMBOL_TABLE to Whatever::SYMBOL_TABLE for consistency.

GarkGarcia commented 4 years ago

Fixed in https://github.com/asciidoctor/asciimath/commit/341543593c43096300ab72e30f8dfac33593bb4c.