Closed Dunbaratu closed 9 years ago
Okay wow. It's NOT a bug anywhere inside the C# code at ALL.
Here's my proof. I added this line to the mod's code
private void VisitString(ParseNode node)
{
NodeStartHousekeeping(node);
Console.WriteLine("eraseme: Just parsed a string literal that was: " + node.Token.Text ); // <--- new line I inserted
AddOpcode(new OpcodePush(node.Token.Text.Trim('"')));
}
And it's showing that the conversion to lowercase is in fact happening INSIDE the TinyPG parser for some weird reason. By the time the compiler sees it, the "A" and "B" and "C" and so on have already become "a", "b" and "c" - long before our own code even sees them.
The TinyPG compiler was told to convert keywords to all lowercase. I suspect it's incorrectly triggering this behavior on literal strings occasionally.
Swap the order of the two lines, so the uppercase version is declared before the lowercase version and whatever stupid thing the parser generator is doing goes away:
local L2 is list("A","B","C","D", "E", "F").
local L1 is list("a","b","c","d", "e", ",").
print L1[1].
print L2[1].
The above works correctly. ?!?
@tdw89 that's a workaround you can try until we fix this - swap the order of your two lists. Put the uppercase version first and call it the shift=0 version and the lowercase one the shift=1 version.
The important bit of this problem is the spacing.
local L1 is list("a","b","c","d", "e", ",").
local L2 is list("A","B","C","D", "E", "F").
print L1[1]. //b
print L2[1]. //b
if you change the spacing
local L1 is list("a", "b", "c", "d", "e", ",").
local L2 is list("A", "B", "C", "D", "E", "F").
print L1[1]. //b
print L2[1]. //B
and the reason that the order mattered is that kOS.Safe.Compilation.Script.ExtractStrings()
finds ","
before "b"
and when we mangle the line local L2 is list("A","B","C","D", "E", "F").
it replaces all of the ","
with the token name and then doesnt match B
. leaving us with the output
local L1 is list([s1],[s2],[s3],[s4], [s5], [s6]).
local L2 is list("A[s6]B[s6]C[s6]D", [s11], [s12]).
print L1[1].
print L2[1].
everything is then lowered
local l1 is list([s1],[s2],[s3],[s4], [s5], [s6]).
local l2 is list("a[s6]b[s6]c[s6]d", [s11], [s12]).
print l1[1].
print l2[1].
and the string tokens restored
local l1 is list("a","b","c","d", "e", ",").
local l2 is list("a","b","c","d", "E", "F").
print l1[1].
print l2[1].
going to sleep now :)
@Dunbaratu i very much wanted to just type
print l1[4]. //e
print l2[4]. //E
and imagine your brain melting from the
Okay I just had a look at this ugly ugly monstrosity. I wanted to sleep but this is bugging me.
So, essentially, instead of letting the compiler handle case insensitivity of keywords itself, instead it just goes through everything and lowercases it all BEFORE the compiler sees it, and attempts to protect the string literals from that mangling, and it's in that code that it failed. It doesn't protect the string literals very well because its using a much more crude regex than the actual compiler is to find them.
I say the fix is to nuke Script.MakeLowerCase entirely. Kill that monstrously bad design. instead replace it with regexes in kRISC.tpg that explicitly add the ignore-case directive (?i)
to the regex strings where desired: As in doing
PRINT -> @"(?i)\bprint\b";
AT -> @"(?i)\bat\b";
ON -> @"(?i)\bon\b";
and so on all through the case insensitive tokens, and the identifiers.
The point being, this case insensitivity is part of the definition of kerboscript syntax. Not something to hardcode into the script compiler for all possible language bindings like it is now.
This came originally from a more complex example encountered by @tdw89. I've managed to trim it down to this as a minimal example that causes it:
The following program operates as you'd expect:
It prints:
Like it should.
But now, edit the first line so that the 'f' is a comma instead, like so:
Now it prints this instead:
Behind the scenes, it really is genuinely doing a
push b
instead of apush B
when building the args to the LIST() constructor. Something about that single comma confuses everything so it starts treating the uppercase strings as lowercase after that.But weirdly, it is very touchy and specific. If you alter that example too much, removing terms from it, or making it not be a list anymore, the problem does not surface.