List with a single comma string value sometimes turns uppercase strings to lower

Dunbaratu commented 9 years ago

This came originally from a more complex example encountered by @tdw89. I've managed to trim it down to this as a minimal example that causes it:

The following program operates as you'd expect:

local L1 is list("a","b","c","d", "e", "f").
local L2 is list("A","B","C","D", "E", "F").
print L1[1].
print L2[1].

It prints:

b
B

Like it should.

But now, edit the first line so that the 'f' is a comma instead, like so:

local L1 is list("a","b","c","d", "e", ",").
local L2 is list("A","B","C","D", "E", "F").
print L1[1].
print L2[1].

Now it prints this instead:

b
b

Behind the scenes, it really is genuinely doing a push b instead of a push B when building the args to the LIST() constructor. Something about that single comma confuses everything so it starts treating the uppercase strings as lowercase after that.

But weirdly, it is very touchy and specific. If you alter that example too much, removing terms from it, or making it not be a list anymore, the problem does not surface.

Dunbaratu commented 9 years ago

Okay wow. It's NOT a bug anywhere inside the C# code at ALL.

Here's my proof. I added this line to the mod's code

        private void VisitString(ParseNode node)
        {
            NodeStartHousekeeping(node);

            Console.WriteLine("eraseme: Just parsed a string literal that was: " + node.Token.Text ); // <--- new line I inserted

            AddOpcode(new OpcodePush(node.Token.Text.Trim('"')));
        }

And it's showing that the conversion to lowercase is in fact happening INSIDE the TinyPG parser for some weird reason. By the time the compiler sees it, the "A" and "B" and "C" and so on have already become "a", "b" and "c" - long before our own code even sees them.

Dunbaratu commented 9 years ago

The TinyPG compiler was told to convert keywords to all lowercase. I suspect it's incorrectly triggering this behavior on literal strings occasionally.

Dunbaratu commented 9 years ago

Very weird workaround.

Swap the order of the two lines, so the uppercase version is declared before the lowercase version and whatever stupid thing the parser generator is doing goes away:

local L2 is list("A","B","C","D", "E", "F").
local L1 is list("a","b","c","d", "e", ",").
print L1[1].
print L2[1].

The above works correctly. ?!?

@tdw89 that's a workaround you can try until we fix this - swap the order of your two lists. Put the uppercase version first and call it the shift=0 version and the lowercase one the shift=1 version.

erendrake commented 9 years ago

The important bit of this problem is the spacing.

local L1 is list("a","b","c","d", "e", ",").
local L2 is list("A","B","C","D", "E", "F").
print L1[1]. //b
print L2[1]. //b

if you change the spacing

local L1 is list("a", "b", "c", "d", "e", ",").
local L2 is list("A", "B", "C", "D", "E", "F").
print L1[1]. //b
print L2[1]. //B

and the reason that the order mattered is that kOS.Safe.Compilation.Script.ExtractStrings() finds "," before "b" and when we mangle the line local L2 is list("A","B","C","D", "E", "F"). it replaces all of the "," with the token name and then doesnt match B. leaving us with the output

local L1 is list([s1],[s2],[s3],[s4], [s5], [s6]).
local L2 is list("A[s6]B[s6]C[s6]D", [s11], [s12]).
print L1[1].
print L2[1].

everything is then lowered

local l1 is list([s1],[s2],[s3],[s4], [s5], [s6]).
local l2 is list("a[s6]b[s6]c[s6]d", [s11], [s12]).
print l1[1].
print l2[1].

and the string tokens restored

local l1 is list("a","b","c","d", "e", ",").
local l2 is list("a","b","c","d", "E", "F").
print l1[1].
print l2[1].

going to sleep now :)

erendrake commented 9 years ago

@Dunbaratu i very much wanted to just type

print l1[4]. //e
print l2[4]. //E

and imagine your brain melting from the

tim-and-eric-mind-blown

Dunbaratu commented 9 years ago

Okay I just had a look at this ugly ugly monstrosity. I wanted to sleep but this is bugging me.

So, essentially, instead of letting the compiler handle case insensitivity of keywords itself, instead it just goes through everything and lowercases it all BEFORE the compiler sees it, and attempts to protect the string literals from that mangling, and it's in that code that it failed. It doesn't protect the string literals very well because its using a much more crude regex than the actual compiler is to find them.

I say the fix is to nuke Script.MakeLowerCase entirely. Kill that monstrously bad design. instead replace it with regexes in kRISC.tpg that explicitly add the ignore-case directive (?i) to the regex strings where desired: As in doing

PRINT        -> @"(?i)\bprint\b";
AT           -> @"(?i)\bat\b";
ON           -> @"(?i)\bon\b";

and so on all through the case insensitive tokens, and the identifiers.

The point being, this case insensitivity is part of the definition of kerboscript syntax. Not something to hardcode into the script compiler for all possible language bindings like it is now.

KSP-KOS / KOS

List with a single comma string value sometimes turns uppercase strings to lower #925

Very weird workaround.