Open Happypig375 opened 5 years ago
Only had time for a very brief review. Can look again this weekend.
This comment here as it may get some discussion:
toAtom
gives a Result<MathAtom, string>
Should it give this or a MathAtom option
?
The former will have more performant processing of error cases, since they are not exceptional. But it will need more work to maintain, as results are harder to compose. It will also be slower in the happy case, especially if computation expressions are needed to deal with the complexity of Results.
The alternative is an option, where failure to find things gives a None, and where invalid syntax gives an exception.
I think invalid syntax can be treated as exceptional, so we can make our lives easier by avoiding Result. Any other opinions?
In general it's good that you have got something working but a lot of simplification and explanation will be needed. I wasn't able to discover the strategy used for parsing LaTeX because it was hidden behind functions with highly obscure type signatures.
Names should be informative and their should be xml docs but it's only possible to name and describe objects once they have been simplified enough to be comprehensible.
I will try to do some constructive work in this direction this weekend.
Possibly useful? https://github.com/charlesroddie/StyleGuide
But it will need more work to maintain, as results are harder to compose.
Result.map
and Result.bind
can be used.
It will also be slower in the happy case, especially if computation expressions are needed to deal with the complexity of Results.
I read about the slowness of computation expressions. I wouldn't use them for Results.
In general it's good that you have got something working but a lot of simplification and explanation will be needed. I wasn't able to discover the strategy used for parsing LaTeX because it was hidden behind functions with highly obscure type signatures.
Will simplify the code a bit.
I have thought more about Results and also considered a simple issue in another repo. The alternatives are:
I believe exceptions is better, because the inner functions are simplified to have types which return 'a
instead of types which return Option<'a>
or Result<'a,'e>
. Of course in .Net exceptions should only be used for exceptional data.
A parsing function routine which for example looks for a certain pattern should return None if it fails to find the pattern; these are not exceptional.
A parsing function which is used when a certain pattern is known to be implied should throw an exception on failure. For example after seeing \color
the algorithm is allowed to find {
and }
and fail if it doesn't find them.
The global algorithm can catch failures and either return Error("invalid LaTeX") or fail with an InvalidLatexException.
I've resolved the AliasDictionary issues (simplification and documentation). Can move to LaTeX.fs now. Would appreciate documentation or simplification explanation of the methods so I can get started with this @Happypig375 . Thanks.
Now AliasDictionary
, as a collection type, does not implement normal collection interfaces. Like MathCollection
, it will surprise the user when trying to use AliasDictionary
in functions taking e.g. IEnumerable<T>
.
Why does a user need to interact with an alias dictionary? If there is a need for extension in future it can be done but it is comprehensible now, as simple as possible for what it is doing, documented, and definitely provides the functionality we are using at the moment (or else the project wouldn't compile).
For example, LaTeXDefaultMaps.Commands.Where(command => command.Key.StartsWith("a"))
does not compile because AliasDictionary
does not implement IEnumerable<T>
.
I removed as much as possible while keeping the main project buildable. Ok do do one or two extra things like expose the keys or remove private on the dictionaries as needed in future. But we shouldn't add them until then. It's important to keep good discipline. I think the file went from 100 lines to 50. Maybe in future it goes back up to 55 if we need something extra.
While I agree on YANGI in general, I think all collection types should at least act like collections, have collection methods, and implement collection interfaces.
Also, why make Delimiter a DU which limits its possible values? It also kind of repeats information in LaTeXDefaultMaps.Delimiters
.
I think all collection types should at least act like collections, have collection methods, and implement collection interfaces
Not all interfaces; otherwise it gets too large to maintain (as the previous code was). We should consider this an internal type, and add these things as needed. The correct API if we do add them is not obvious because there are two dictionaries. For example member __.Aliases = d.Keys
is probably OK (I have added this back as it should be useful for parsing) but should we have member __.Values = d.Values
or member __.Values = e.Keys
; these actually have different, and rather obscure, types. The original code always chose the dictionary from aliases to values when doing the various additional properties but that is not a clear decision. Maybe casting both dictionaries to IReadOnlyDictionary might give us everything we could possibly need in future. But no need to spend time on that (writing this conversion, code review, documentation) at the moment. I have limited time for code review and documentation and would rather spend this on important stuff like the parser than on methods that don't end up getting used.
Also, why make Delimiter a DU which limits its possible values
There are a limited number of delimiters so representing these as a limited number of values is better. Easier to understand code with a finite number of values than infinite number of possible values of which most are invalid. There is the option of adding an Delimiter.UserDefined of string
case if absolutely necessary (much) later.
You had enumerated all of the LaTeX delimiters:
This idea of yours is interesting:
let Commands =
[ ["frac"], Fraction (Argument 1, Argument 2, Center, Center, ValueNone)
["sqrt"], Radical (Argument_Optional (1, Row []), Argument 1)
|> AliasDictionary
While we may not be able to do exactly this (since it requires usage of an undisplayable internal Latex-specific Argument atom), we could adjust it like this:
[<Struct>][<RequireQualifiedAccess>]
type Command =
| Frac
/// A square root or nth root
| Sqrt
| Color
let Commands =
[ ["frac"], Command.Frac
["sqrt"], Command.Sqrt
["color"], Command.Color]
|> AliasDictionary
let parseColor:string -> System.Drawing.Color = failwith "not implemented"
let rec parseCommand(command:Command,
optionals:string[],
args:string[],
parseAtom:string->MathAtom) =
let require (n:int) =
if args.Length <> n
then failwith <| (string n) + " inputs required by " + (Commands.GetPrimaryAlias command)
let maxOptional (n:int) =
if optionals.Length > n
then failwith <| "At most " + (string n) + " optional inputs allowed in " + (Commands.GetPrimaryAlias command)
let getOptional n =
if n < optionals.Length then Some(optionals.[n]) else None
match command with
| Command.Frac ->
require 2
Fraction(parseAtom args.[0], parseAtom args.[1], ValueNone, ValueNone, ValueNone)
| Command.Sqrt ->
require 1
maxOptional 1
Radical(getOptional 0 |> Option.map parseAtom, parseAtom args.[0])
| Command.Color ->
require 2
Colored(parseAtom args.[1], parseColor args.[0])
This also works with things to be parsed that are not Atoms (e.g. Color as shown).
It gets complicated when we are going to implement "\binom" or "\TeX" which give atoms with more complicated structures. These two commands are both present in CSharpMath.
["binom"], Delimited (Delimiter.LeftBracket, Frac (Argument 1, Argument 2, Center, Center, ValueNone), Delimiter.RightBracket)
["TeX"], Styled (Row [Ordinary "T", Space -.1667, Offsetted (Ordinary "E", 0, -4.5), Space -.125, Ordinary "X"], FontStyles.Roman)
Regarding Result vs Exception: https://github.com/charlesroddie/MathAtom/blob/a0eac56fcefa9594ab2aa57d73b3f50075ff3c03/MathDisplay.Tests.Benchmarks/Result_ExceptionBenchmark.fs#L43-L48 While Result is 2.7 times slower than Exception when the result is Ok, Exception is 1055.5 times slower than Result when the result is Error.
This will probably be very noticeable when using the Example app which displays user-inputted valid/invalid LaTeX real-time.
It gets complicated when we are going to implement "\binom" or "\TeX" which give atoms with more complicated structures. These two commands are both present in CSharpMath.
Those examples easy with the draft scheme I gave.
| Command.Binom ->
require 2
Delimited (Delimiter.LeftBracket, Frac (parseAtom args.[0], parseAtom args.[1], Center, Center, ValueNone), Delimiter.RightBracket)
| Command.TeX ->
require 0
Styled (Row [Ordinary "T", Space -.1667, Offsetted (Ordinary "E", 0, -4.5), Space -.125, Ordinary "X"], FontStyles.Roman)
require
may not be the best way to do it. The best structure may depend on the answer to an important question:
Is it possible to tokenize a LaTeX string (split it into a list of commands, delimiters, ordinaries...) before implementing any logic?
Now we need to define new Command
s if we are going to add new ones. That goes against the point of a Commands
dictionary, which is to reduce duplicate information.
Is it possible to tokenize a LaTeX string (split it into a list of commands, delimiters, ordinaries...) before implementing any logic?
That is the purpose of LaTeX.ToAtom
instead of drawing them directly.
That goes against the point of a Commands dictionary A commands dictionary containing code is no longer a two-way dictionary. (The current code in the PR at this point has a two way dictionary but the second direction won't be usable or useful.)
The purpose of AliasDictionary is to convert multiple alias strings into the things they represent and to give the primary alias of a represented object.
A one-way dictionary is OK from commands to behavior, which is defined on all commands, is better known/represented as a function.
That is the purpose of LaTeX.ToAtom
This does more than lexing.
A one-way dictionary is OK from commands to behavior, which is defined on all commands, is better known/represented as a function.
What I'm trying to recreate with Commands
is two-way pattern matching.
This does more than lexing.
It's just some sort of pattern matching that converts between LaTeX strings and MathAtoms.
Some miscommunication. I am asking whether lexing of LaTeX is possible.
Lexing itself is possible, just that I don't see a reason for the extra step.
Two way pattern matching: I'm skeptical that will work. I think we need to get this repo into a clean state with comprehensible code and any attempts like that can branch off as experiments. Don't want to tackle extremely difficult stuff before we have good basics in place.
I think I will need to implement a simple parser.
Serialization of some simple mathatoms into latex will have some utility. And some sharing of data, especially names and aliases, could be shared between parser and serializer. But it's important not to couple parsing to a complex system to create a difficult coupling between parser and serializer and mathatom itself. That will prevent getting master into any sort of reasonable shape in the near future.
I would like this pr to go into master which means keeping it clean. I will write a parser but am travelling for next 2 weeks (including HK, could meet there). Anything to do with Serialization (beyond aliases and a list of commands) shouldn't go in this pr. Once we have a good pr to merge then master will be in a good state to branch off from with feature branches.
When will you be in HK?
If no more review comments, then I'm merging.