weird parsing bug - NUMBER(NUMBER) failing at runtime instead of compile time.

Dunbaratu commented 5 years ago

This came up watching @lucaelin's stream just now. Luca left off the asterisk for a multiplication, it was supposed to be like this:

print 2*(2). // actual syntax was way more complex, but this is enough to cause the problem.

But she had at typo to just use bare parens for multiplication (like you can do in math notation but not most programming languages), like so:

print 2(2). // no asterisk

Now, this is an error, and kOS did give an error. But the problem is WHEN it gave the error was "too late". It should give the error at COMPILE time as this shouldn't have been valid syntax in the first place. But it waited until RUN time to give an error, because the parser thought there was a 'valid' meaning for that, which there isn't supposed to be. This meant the script was well into the mission before the error was thrown, when this kind of error should have been caught when it first was run.

The error it gives is this:

kOS: At interpreter, line 15
print 2  ( 2 ).
           ^

(Filename: C:/buildslave/unity/build/artifacts/generated/common/runtime/DebugBindings.gen.cpp Line: 51)

kOS.Safe.Exceptions.KOSUndefinedIdentifierException: Undefined Variable Name ''.

The reason it has the runtime error instead of a syntax error is that it believed a "valid" interpretation of this syntax ends up trying to call a "function" called "" (null string) with arglist = (2). As if it was doing:

PRINT nullstring_identifier_name (2).

The relevant opcodes it generates are these:

interpreter            15:1   0116 @0096   push _KOSArgMarker_ 
interpreter            15:10  0117 @0097   push _KOSArgMarker_ 
interpreter            15:12  0118 @0098   push 2 
interpreter            15:12  0119 @0099   call  <<--INSTRUCTION POINTER--
interpreter            15:12  0120 @0100   call print() 
interpreter            15:12  0121 @0101   pop 
interpreter            15:0   0122 @0102   popscope 1

That line where the "<<--INSTRUCTION POINTER--`" is, that's a "call" to a function named "" (nullstring).

evandisoft commented 5 years ago

I think I see, in the ".tpg" file, what it is doing. It is resolving that expression to a "suffixterm".

Then there's the rule suffixterm->atom suffixterm_trailer* The first '2' in the expression is the "atom". Then for suffixterm_trailer->(function_trailer | array_trailer) it picks function_trailer because that starts with a bracketopen. Then in Compiler.cs, it's expecting an identifier token instead of a number token, and so GetIdentifierText returns the Empty string

string firstIdentifier = "";
bool isUserFunc = false;
if (nodeIndex == 0)
{
     firstIdentifier = GetIdentifierText(suffixTerm);

private string GetIdentifierText(ParseNode node)
        {
            //Prevent recursing through parenthesized sub-expressions:
            if (node.Token.Type == TokenType.expr)
                return string.Empty;

            if (node.Token.Type == TokenType.IDENTIFIER || node.Token.Type == TokenType.FILEIDENT)
            {
                return node.Token.Text;
            }
            foreach (ParseNode child in node.Nodes)
            {
                string identifier = GetIdentifierText(child);
                if (identifier != string.Empty)
                    return identifier;
            }

            return string.Empty;
        }

Throwing an error in GetIdentifierText might work, or it might effect other things negatively. The identifier that is returned from it is later passed to the "VisitActualFunction" function in Compiler.cs as the "functionIdentifier". In "VisitActualFunction" it becomes "directName".

Maybe you could throw an error in "VisitActualFunction" if the function "isDirect" and has empty string as the "directName". I believe that direct requires it to have a non-empty directname and I believe that this is being considered a "direct" call.

I didn't check any of this by putting in any logging statements, but rather, just looking at the code so I haven't technically verified any of this. Just looks to me like that is the bug.

evandisoft commented 5 years ago

EDIT: Ok, I've been told how to get TinyPG. I will not work on this particular bug right now but I may eventually do so. It would be best to fix the grammar for this rather than to add checks in the compiler.

evandisoft commented 5 years ago

I'm testing these with

wait 2.
print(expr).

This is a more general issue. for expr Try 2:asdf waits 2 seconds and then causes GET suffix 'ASDF' not found on object 2 2[3] waits 2 seconds and then causes can't iterate over object of type ScalarIntValue true[3] waits 2 seconds and then causes error. "blah"() waits 2 seconds and then causes error. undefined variable name '' But asdf.ghjk() waits 2 seconds and then cause an error undefined variable name asdf.ghjk

So having sci_number, TRUEFALSE, FILEIDENT and STRING as possible matches for atom, given where atom is in suffixterm, doesn't seem to work very well.

If this is changed at the grammar level, I imagine it may require significant changes at the compiler level.

evandisoft commented 5 years ago

This might work:

Or if you want numbers/truefalse to be able to have suffixes: suffix -> (atom | suffixterm) (suffix_trailer)*; atom -> sci_number | TRUEFALSE | FILEIDENT | STRING;

Also, given how FILEIDENT is desired to be handled as a string, it is a bit ambiguous to the user. lock a to asdf.asdf[0](). lock a to asdf. asdf[0](). do two different things.

A possible fix might be to allow users to optionally terminate with ';' instead of '.'. Then we could mark '.' (if used as a EOI) as deprecated.

I haven't set up tinypg because I think that any changes to this will lead to lots of changes in the compiler. It may be the case that any such change should be part of a larger refactoring of the compiler.

I also think that perhaps some of the names of these terms should be changed. It's confusing that a sci_number can be a suffix.

CodeLeopard commented 3 years ago

I think I found a case of this happening but where it does not crash:

Local myLex is Lexicon().
Local myList is List("Some String, to be used as Key", "A second String, to be used as Key").

//Set myLex:myList[0] to "Error". // Case 1
Set myLex:(myList[0]) to "Uses Empty string as key instead". // Case 2
Set myLex[myList[1]] to "What it should have been.".

Print("myLexicon " + myLex).

This is the result:

RunPath("0:/WIP/BR/BugReport10.ks").
myLexicon LEXICON of 2 items:
"": "Uses Empty string as key instead"
"A second String, to be used as Key": "What it should have been."
Program ended.

KSP-KOS / KOS

weird parsing bug - NUMBER(NUMBER) failing at runtime instead of compile time. #2360