fictiveworks / CalyxSharp

Generative text processing for C# and Unity applications
Other
0 stars 0 forks source link

Unfriendly error message when an expansion is not found i.e. due to typo #21

Closed bentorkington closed 2 years ago

bentorkington commented 2 years ago

If an expansion fails to resolve, the resulting exception message is (everybody's favourite) "Object reference not set to an instance of an object". We should throw a more instructive exception here:

Registry registry = new Registry();
registry.DefineRule("start", new[] { "{greekLetters}" }); // plural
registry.DefineRule("greekLetter", new[] { "alpha" }); // the mistake: not plural

Expansion exp = registry.Evaluate("start"); // throws NullReferenceException
maetl commented 2 years ago

The expected behaviour is to throw a Calyx::Errors::UndefinedRule in strict mode. In non strict mode, the expansion is filled in with an empty string atom (it would probably be nice to add a marker into the result tree that denotes the terminal was synthesised to fill in the gap).

Example with stack trace awareness in Ruby:

require "calyx"

registry = Calyx::Registry.new
registry.rule("start", "{greekletters}")
registry.rule("greekletter", "alpha")

registry.evaluate

Despite the stack trace accordioning out from line 7 (registry.evaluate) the exception prints that the mistake was made on line 4 (registry.rule("start", "{greekletters}")).

/Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/registry.rb:102:in `expand': undefined rule :greekletters in err.rb:4:`registry.rule("start", "{greekletters}")` (Calyx::Errors::UndefinedRule)
    from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/non_terminal.rb:21:in `evaluate'
    from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/concat.rb:52:in `block in evaluate'
    from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/concat.rb:51:in `each'
    from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/concat.rb:51:in `reduce'
    from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/concat.rb:51:in `evaluate'
    from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/choices.rb:49:in `evaluate'
    from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/rule.rb:39:in `evaluate'
    from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/registry.rb:204:in `evaluate'
    from err.rb:7:in `<main>'

This is possible because Registry#define_rule in the Ruby version has dynamic awareness of the calling context, so it can get the line number where the missing rule was declared at initialisation time.

maetl commented 2 years ago

The JavaScript version does not have awareness of the declaration. For the above test case, it throws a generic error with the following message string:

Error: UndefinedRule: greekletters
maetl commented 2 years ago

So the Ruby implementation is high quality and uses a lot of runtime trace information. The JavaScript implementation has the logic correctly coded in but does not go to any effort to be trace aware or provide a custom exception type. The C# implementation just explodes with the infamous null pointer exception.

maetl commented 2 years ago

Aside from having a useful error message that helps direct people to fix the problem, throwing Calyx.Errors.UndefinedRule is important because in strict mode it gives authors the opportunity to trap the specific exception in a catch block.

bentorkington commented 2 years ago

The expected behaviour is to throw a Calyx::Errors::UndefinedRule in strict mode. In non strict mode, the expansion is filled in with an empty string atom (it would probably be nice to add a marker into the result tree that denotes the terminal was synthesised to fill in the gap).

Okay, in strict mode the UndefinedRule exception is thrown at Evaluation time as expected. In non-strict mode the NullRef happens inside CollectAtoms() only once we call Flatten()

An exception of type 'System.NullReferenceException' occurred in CalyxSharp.dll but was not handled in user code: 'Object reference not set to an instance of an object.'
   at Calyx.Expansion.CollectAtoms(StringBuilder concat) in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 56
   at Calyx.Expansion.CollectAtoms(StringBuilder concat) in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 57
   at Calyx.Expansion.CollectAtoms(StringBuilder concat) in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 57
   at Calyx.Expansion.CollectAtoms(StringBuilder concat) in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 57
   at Calyx.Expansion.Flatten() in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 47
   at Calyx.Test.RegistryTest.UndefinedRuleThrows() in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharpTests/RegistryTest.cs:line 94
        foreach (Expansion exps in Tail) {
       // Tail is null
Screenshot 2022-10-05 at 18 03 54
maetl commented 2 years ago

I will fix this, thanks for figuring it out.

bentorkington commented 2 years ago

While chasing it down I ended up taking a look at the Exp enum, which has almost 100 references throughout the code but it turns out that apart from CollectAtoms() (and even then only to distinguish between Atom/non-Atom Expansions, which looks like a candidate for refactoring with the Strategy Pattern), it's only used by tests.

Is Expansion.Symbol intended to be a public API? Would subclasses of an abstract Expansion be nicer?

maetl commented 2 years ago

Yeah this is 100% a consequence of moving from the Lisp-style s-expression pattern to a standard interpreter tree structure.

In Lisp, where there is no distinction between code and data, you often see trees represented as lists with the head being a symbol, then the tail being nested lists that form the descendant branches. Lisp functions and even whole programs are also represented in this format.

In most OO languages with subtype and parametric polymorphism, a more normal thing to do would be to have the types of each branch and leaf node represented as discrete classes that inherit from a base or implement a common interface, then have left/right refs or a children[] array pointing to the descendant nodes.

Mea culpa.

maetl commented 2 years ago

eg: this is what the non-strict mode result tree looks like from the example above as a Ruby array literal:

[:start, [:choice, [:concat, [[:atom, ""], [:greekletters, [:atom, ""]]]]]]
maetl commented 2 years ago

Obvs, I couldn’t figure out how to do this in C# without using a tuple structure which isn’t supported in Unity, but the current setup is just me coding quickly, wanting to get the library working without doing any in-depth research into C#’s type system.

Is it idiomatic in C# to extend ArrayList and implement a custom interface for the expr node? As that might be another way of doing this that isn’t so weird.

public class Expansion: ArrayList, IExpansion
{
    public Expansion(Exp head, IEnumerable<IExpansion> tail)
    {
        Symbol = head;

        foreach (var exp in tail) {
            Add(exp);
        }
    }
}
bentorkington commented 2 years ago

Is it idiomatic in C# to extend ArrayList and implement a custom interface for the expr node

Not that I've seen. Discussions on whether it's sensible tend to rage on, but in practice I can't think of any significant codebases of mine or anyone else's that subclass the Collection types.

I don't have an issue with Lisp-iness / recursion of the data structure, that all seems natural. My only concern was the existence of that enum playing some part in the polymorphism. Fortunately it only turned to do so in a single if() statement in CollectAtoms().

maetl commented 2 years ago

A really well written Stack Overflow question—if a little bit handwringy—and I don’t want to pick on the author, but I found this absolutely hilarious.

Surely Microsoft wouldn't tell me to do extra work for no reason

Surely.

I’m pretty convinced that using lists directly isn’t the best way to do things now anyway, I think the best pattern for this is either having pure tuples or objects representing each list as a head/tail structure, so fairly close to what we currently have at the low level.

More value for the project will come from defining the structure of that tree and how different grammar constructs are represented as a result that is useful to users of the library than the implementation details of the expansion results. Thanks for the advice on this.