Closed bentorkington closed 2 years ago
The expected behaviour is to throw a Calyx::Errors::UndefinedRule
in strict mode. In non strict mode, the expansion is filled in with an empty string atom (it would probably be nice to add a marker into the result tree that denotes the terminal was synthesised to fill in the gap).
Example with stack trace awareness in Ruby:
require "calyx"
registry = Calyx::Registry.new
registry.rule("start", "{greekletters}")
registry.rule("greekletter", "alpha")
registry.evaluate
Despite the stack trace accordioning out from line 7 (registry.evaluate
) the exception prints that the mistake was made on line 4 (registry.rule("start", "{greekletters}")
).
/Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/registry.rb:102:in `expand': undefined rule :greekletters in err.rb:4:`registry.rule("start", "{greekletters}")` (Calyx::Errors::UndefinedRule)
from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/non_terminal.rb:21:in `evaluate'
from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/concat.rb:52:in `block in evaluate'
from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/concat.rb:51:in `each'
from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/concat.rb:51:in `reduce'
from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/concat.rb:51:in `evaluate'
from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/syntax/choices.rb:49:in `evaluate'
from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/rule.rb:39:in `evaluate'
from /Users/maetl/.gem/ruby/3.1.2/gems/calyx-0.22.0/lib/calyx/registry.rb:204:in `evaluate'
from err.rb:7:in `<main>'
This is possible because Registry#define_rule
in the Ruby version has dynamic awareness of the calling context, so it can get the line number where the missing rule was declared at initialisation time.
The JavaScript version does not have awareness of the declaration. For the above test case, it throws a generic error with the following message string:
Error: UndefinedRule: greekletters
So the Ruby implementation is high quality and uses a lot of runtime trace information. The JavaScript implementation has the logic correctly coded in but does not go to any effort to be trace aware or provide a custom exception type. The C# implementation just explodes with the infamous null pointer exception.
Aside from having a useful error message that helps direct people to fix the problem, throwing Calyx.Errors.UndefinedRule
is important because in strict mode it gives authors the opportunity to trap the specific exception in a catch block.
The expected behaviour is to throw a
Calyx::Errors::UndefinedRule
in strict mode. In non strict mode, the expansion is filled in with an empty string atom (it would probably be nice to add a marker into the result tree that denotes the terminal was synthesised to fill in the gap).
Okay, in strict mode the UndefinedRule exception is thrown at Evaluation time as expected. In non-strict mode the NullRef happens inside CollectAtoms() only once we call Flatten()
An exception of type 'System.NullReferenceException' occurred in CalyxSharp.dll but was not handled in user code: 'Object reference not set to an instance of an object.'
at Calyx.Expansion.CollectAtoms(StringBuilder concat) in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 56
at Calyx.Expansion.CollectAtoms(StringBuilder concat) in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 57
at Calyx.Expansion.CollectAtoms(StringBuilder concat) in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 57
at Calyx.Expansion.CollectAtoms(StringBuilder concat) in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 57
at Calyx.Expansion.Flatten() in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharp/Expansion.cs:line 47
at Calyx.Test.RegistryTest.UndefinedRuleThrows() in /Users/ben/repos/clients/maetl/CalyxSharp/CalyxSharpTests/RegistryTest.cs:line 94
foreach (Expansion exps in Tail) {
// Tail is null
I will fix this, thanks for figuring it out.
While chasing it down I ended up taking a look at the Exp enum, which has almost 100 references throughout the code but it turns out that apart from CollectAtoms()
(and even then only to distinguish between Atom/non-Atom Expansions, which looks like a candidate for refactoring with the Strategy Pattern), it's only used by tests.
Is Expansion.Symbol
intended to be a public API? Would subclasses of an abstract Expansion
be nicer?
Yeah this is 100% a consequence of moving from the Lisp-style s-expression pattern to a standard interpreter tree structure.
In Lisp, where there is no distinction between code and data, you often see trees represented as lists with the head being a symbol, then the tail being nested lists that form the descendant branches. Lisp functions and even whole programs are also represented in this format.
In most OO languages with subtype and parametric polymorphism, a more normal thing to do would be to have the types of each branch and leaf node represented as discrete classes that inherit from a base or implement a common interface, then have left/right refs or a children[] array pointing to the descendant nodes.
Mea culpa.
eg: this is what the non-strict mode result tree looks like from the example above as a Ruby array literal:
[:start, [:choice, [:concat, [[:atom, ""], [:greekletters, [:atom, ""]]]]]]
Obvs, I couldn’t figure out how to do this in C# without using a tuple structure which isn’t supported in Unity, but the current setup is just me coding quickly, wanting to get the library working without doing any in-depth research into C#’s type system.
Is it idiomatic in C# to extend ArrayList and implement a custom interface for the expr node? As that might be another way of doing this that isn’t so weird.
public class Expansion: ArrayList, IExpansion
{
public Expansion(Exp head, IEnumerable<IExpansion> tail)
{
Symbol = head;
foreach (var exp in tail) {
Add(exp);
}
}
}
Is it idiomatic in C# to extend ArrayList and implement a custom interface for the expr node
Not that I've seen. Discussions on whether it's sensible tend to rage on, but in practice I can't think of any significant codebases of mine or anyone else's that subclass the Collection types.
I don't have an issue with Lisp-iness / recursion of the data structure, that all seems natural. My only concern was the existence of that enum playing some part in the polymorphism. Fortunately it only turned to do so in a single if() statement in CollectAtoms()
.
A really well written Stack Overflow question—if a little bit handwringy—and I don’t want to pick on the author, but I found this absolutely hilarious.
Surely Microsoft wouldn't tell me to do extra work for no reason
Surely.
I’m pretty convinced that using lists directly isn’t the best way to do things now anyway, I think the best pattern for this is either having pure tuples or objects representing each list as a head
/tail
structure, so fairly close to what we currently have at the low level.
More value for the project will come from defining the structure of that tree and how different grammar constructs are represented as a result that is useful to users of the library than the implementation details of the expansion results. Thanks for the advice on this.
If an expansion fails to resolve, the resulting exception message is (everybody's favourite) "Object reference not set to an instance of an object". We should throw a more instructive exception here: