charlesroddie / MathAtom

Structural representations of visual mathematics expressions for use in .Net rendering libraries
MIT License
7 stars 0 forks source link

Rewrote the LaTeX parser to be more flexible #10

Open Happypig375 opened 5 years ago

Happypig375 commented 5 years ago

If no more review comments, then I'm merging.

charlesroddie commented 5 years ago

Only had time for a very brief review. Can look again this weekend.

charlesroddie commented 5 years ago

This comment here as it may get some discussion: toAtom gives a Result<MathAtom, string> Should it give this or a MathAtom option?

The former will have more performant processing of error cases, since they are not exceptional. But it will need more work to maintain, as results are harder to compose. It will also be slower in the happy case, especially if computation expressions are needed to deal with the complexity of Results.

The alternative is an option, where failure to find things gives a None, and where invalid syntax gives an exception.

I think invalid syntax can be treated as exceptional, so we can make our lives easier by avoiding Result. Any other opinions?

charlesroddie commented 5 years ago

In general it's good that you have got something working but a lot of simplification and explanation will be needed. I wasn't able to discover the strategy used for parsing LaTeX because it was hidden behind functions with highly obscure type signatures.

Names should be informative and their should be xml docs but it's only possible to name and describe objects once they have been simplified enough to be comprehensible.

I will try to do some constructive work in this direction this weekend.

charlesroddie commented 5 years ago

Possibly useful? https://github.com/charlesroddie/StyleGuide

Happypig375 commented 5 years ago

But it will need more work to maintain, as results are harder to compose.

Result.map and Result.bind can be used.

It will also be slower in the happy case, especially if computation expressions are needed to deal with the complexity of Results.

I read about the slowness of computation expressions. I wouldn't use them for Results.

In general it's good that you have got something working but a lot of simplification and explanation will be needed. I wasn't able to discover the strategy used for parsing LaTeX because it was hidden behind functions with highly obscure type signatures.

Will simplify the code a bit.

charlesroddie commented 5 years ago

I have thought more about Results and also considered a simple issue in another repo. The alternatives are:

I believe exceptions is better, because the inner functions are simplified to have types which return 'a instead of types which return Option<'a> or Result<'a,'e>. Of course in .Net exceptions should only be used for exceptional data.

A parsing function routine which for example looks for a certain pattern should return None if it fails to find the pattern; these are not exceptional.

A parsing function which is used when a certain pattern is known to be implied should throw an exception on failure. For example after seeing \color the algorithm is allowed to find { and } and fail if it doesn't find them.

The global algorithm can catch failures and either return Error("invalid LaTeX") or fail with an InvalidLatexException.

charlesroddie commented 5 years ago

I've resolved the AliasDictionary issues (simplification and documentation). Can move to LaTeX.fs now. Would appreciate documentation or simplification explanation of the methods so I can get started with this @Happypig375 . Thanks.

Happypig375 commented 5 years ago

Now AliasDictionary, as a collection type, does not implement normal collection interfaces. Like MathCollection, it will surprise the user when trying to use AliasDictionary in functions taking e.g. IEnumerable<T>.

charlesroddie commented 5 years ago

Why does a user need to interact with an alias dictionary? If there is a need for extension in future it can be done but it is comprehensible now, as simple as possible for what it is doing, documented, and definitely provides the functionality we are using at the moment (or else the project wouldn't compile).

Happypig375 commented 5 years ago

For example, LaTeXDefaultMaps.Commands.Where(command => command.Key.StartsWith("a")) does not compile because AliasDictionary does not implement IEnumerable<T>.

charlesroddie commented 5 years ago

I removed as much as possible while keeping the main project buildable. Ok do do one or two extra things like expose the keys or remove private on the dictionaries as needed in future. But we shouldn't add them until then. It's important to keep good discipline. I think the file went from 100 lines to 50. Maybe in future it goes back up to 55 if we need something extra.

Happypig375 commented 5 years ago

While I agree on YANGI in general, I think all collection types should at least act like collections, have collection methods, and implement collection interfaces.

Also, why make Delimiter a DU which limits its possible values? It also kind of repeats information in LaTeXDefaultMaps.Delimiters.

charlesroddie commented 5 years ago

I think all collection types should at least act like collections, have collection methods, and implement collection interfaces

Not all interfaces; otherwise it gets too large to maintain (as the previous code was). We should consider this an internal type, and add these things as needed. The correct API if we do add them is not obvious because there are two dictionaries. For example member __.Aliases = d.Keys is probably OK (I have added this back as it should be useful for parsing) but should we have member __.Values = d.Values or member __.Values = e.Keys; these actually have different, and rather obscure, types. The original code always chose the dictionary from aliases to values when doing the various additional properties but that is not a clear decision. Maybe casting both dictionaries to IReadOnlyDictionary might give us everything we could possibly need in future. But no need to spend time on that (writing this conversion, code review, documentation) at the moment. I have limited time for code review and documentation and would rather spend this on important stuff like the parser than on methods that don't end up getting used.

Also, why make Delimiter a DU which limits its possible values

There are a limited number of delimiters so representing these as a limited number of values is better. Easier to understand code with a finite number of values than infinite number of possible values of which most are invalid. There is the option of adding an Delimiter.UserDefined of string case if absolutely necessary (much) later.

charlesroddie commented 5 years ago

You had enumerated all of the LaTeX delimiters: image

charlesroddie commented 5 years ago

This idea of yours is interesting:

let Commands =
    [   ["frac"], Fraction (Argument 1, Argument 2, Center, Center, ValueNone)
        ["sqrt"], Radical (Argument_Optional (1, Row []), Argument 1)
    |> AliasDictionary

While we may not be able to do exactly this (since it requires usage of an undisplayable internal Latex-specific Argument atom), we could adjust it like this:

[<Struct>][<RequireQualifiedAccess>]
type Command =
    | Frac
    /// A square root or nth root
    | Sqrt
    | Color

let Commands =
    [   ["frac"], Command.Frac
        ["sqrt"], Command.Sqrt
        ["color"], Command.Color]
    |> AliasDictionary

let parseColor:string -> System.Drawing.Color = failwith "not implemented"

let rec parseCommand(command:Command,
                     optionals:string[],
                     args:string[],
                     parseAtom:string->MathAtom) =
    let require (n:int) =
        if args.Length <> n
        then failwith <| (string n) + " inputs required by " + (Commands.GetPrimaryAlias command)
    let maxOptional (n:int) =
        if optionals.Length > n
        then failwith <| "At most " + (string n) + " optional inputs allowed in " + (Commands.GetPrimaryAlias command)
    let getOptional n =
        if n < optionals.Length then Some(optionals.[n]) else None
    match command with
    | Command.Frac ->
        require 2
        Fraction(parseAtom args.[0], parseAtom args.[1], ValueNone, ValueNone, ValueNone)
    | Command.Sqrt ->
        require 1
        maxOptional 1
        Radical(getOptional 0 |> Option.map parseAtom, parseAtom args.[0])
    | Command.Color ->
        require 2
        Colored(parseAtom args.[1], parseColor args.[0])

This also works with things to be parsed that are not Atoms (e.g. Color as shown).

Happypig375 commented 5 years ago

It gets complicated when we are going to implement "\binom" or "\TeX" which give atoms with more complicated structures. These two commands are both present in CSharpMath.

["binom"], Delimited (Delimiter.LeftBracket, Frac (Argument 1, Argument 2, Center, Center, ValueNone), Delimiter.RightBracket)
["TeX"], Styled (Row [Ordinary "T", Space -.1667, Offsetted (Ordinary "E", 0, -4.5), Space -.125, Ordinary "X"], FontStyles.Roman)
Happypig375 commented 5 years ago

Regarding Result vs Exception: https://github.com/charlesroddie/MathAtom/blob/a0eac56fcefa9594ab2aa57d73b3f50075ff3c03/MathDisplay.Tests.Benchmarks/Result_ExceptionBenchmark.fs#L43-L48 While Result is 2.7 times slower than Exception when the result is Ok, Exception is 1055.5 times slower than Result when the result is Error.

Happypig375 commented 5 years ago

This will probably be very noticeable when using the Example app which displays user-inputted valid/invalid LaTeX real-time.

charlesroddie commented 5 years ago

It gets complicated when we are going to implement "\binom" or "\TeX" which give atoms with more complicated structures. These two commands are both present in CSharpMath.

Those examples easy with the draft scheme I gave.

    | Command.Binom ->
        require 2
        Delimited (Delimiter.LeftBracket, Frac (parseAtom args.[0], parseAtom args.[1], Center, Center, ValueNone), Delimiter.RightBracket)
    | Command.TeX ->
        require 0
        Styled (Row [Ordinary "T", Space -.1667, Offsetted (Ordinary "E", 0, -4.5), Space -.125, Ordinary "X"], FontStyles.Roman)

require may not be the best way to do it. The best structure may depend on the answer to an important question:

Is it possible to tokenize a LaTeX string (split it into a list of commands, delimiters, ordinaries...) before implementing any logic?

Happypig375 commented 5 years ago

Now we need to define new Commands if we are going to add new ones. That goes against the point of a Commands dictionary, which is to reduce duplicate information.

Is it possible to tokenize a LaTeX string (split it into a list of commands, delimiters, ordinaries...) before implementing any logic?

That is the purpose of LaTeX.ToAtom instead of drawing them directly.

charlesroddie commented 5 years ago

That goes against the point of a Commands dictionary A commands dictionary containing code is no longer a two-way dictionary. (The current code in the PR at this point has a two way dictionary but the second direction won't be usable or useful.)

The purpose of AliasDictionary is to convert multiple alias strings into the things they represent and to give the primary alias of a represented object.

A one-way dictionary is OK from commands to behavior, which is defined on all commands, is better known/represented as a function.

That is the purpose of LaTeX.ToAtom

This does more than lexing.

Happypig375 commented 5 years ago

A one-way dictionary is OK from commands to behavior, which is defined on all commands, is better known/represented as a function.

What I'm trying to recreate with Commands is two-way pattern matching.

This does more than lexing.

It's just some sort of pattern matching that converts between LaTeX strings and MathAtoms.

charlesroddie commented 5 years ago

Some miscommunication. I am asking whether lexing of LaTeX is possible.

Happypig375 commented 5 years ago

Lexing itself is possible, just that I don't see a reason for the extra step.

charlesroddie commented 5 years ago

Two way pattern matching: I'm skeptical that will work. I think we need to get this repo into a clean state with comprehensible code and any attempts like that can branch off as experiments. Don't want to tackle extremely difficult stuff before we have good basics in place.

charlesroddie commented 5 years ago

I think I will need to implement a simple parser.

charlesroddie commented 5 years ago

Serialization of some simple mathatoms into latex will have some utility. And some sharing of data, especially names and aliases, could be shared between parser and serializer. But it's important not to couple parsing to a complex system to create a difficult coupling between parser and serializer and mathatom itself. That will prevent getting master into any sort of reasonable shape in the near future.

charlesroddie commented 5 years ago

I would like this pr to go into master which means keeping it clean. I will write a parser but am travelling for next 2 weeks (including HK, could meet there). Anything to do with Serialization (beyond aliases and a list of commands) shouldn't go in this pr. Once we have a good pr to merge then master will be in a good state to branch off from with feature branches.

Happypig375 commented 5 years ago

When will you be in HK?