Closed ivan-mogilko closed 3 months ago
I think the place to put the check that would give the error message would be here:
Not sure how it would break from here. Does any character puts it in infinite loop or is there one specific that you verified?
Writing a new test for this would be a good way to debug this
Then it's a matter of adjusting the preprocessor to produce the desired behavior in the test.
Does any character puts it in infinite loop or is there one specific that you verified?
Multiple characters were reported by different users, and i tried with a random one, so I assume that any character which unicode index exceeds either 127, or 255 (to make it safe).
OK, I understand why the infinite loop, it happens because of GetNextWord
, instead of it going until the next break (edit: the hard part is a lot of things can be a break though, not just spaces apparently) or dot, to retrieve the word, it uses the valid ascii characters as check and breaks on everything else.
I think it's best that GetNextWord just use breaks and then we can have an additional function that does the required validation. Edit: but it's apparently super hard to do this in the code, or at least it breaks a ton of things when I try.
Edit: made a way to workaround with the following function, that essentially try to get what is the next non-scriptWordChar and then if it's not something that would be valid there it throws an error. There is a bit of dancing around to set the text
that is passed as ref because otherwise we get in inifinite loop.
private string GetNextWord(ref string text, bool trimText, bool includeDots)
{
int i = 0;
while ((i < text.Length) &&
(text[i].IsScriptWordChar() ||
(includeDots && (text[i] == '.')))
)
{
i++;
}
// added this
if(i < text.Length && !(char.IsWhiteSpace(text[i]) || text[i] == ';' || text[i] == ',' || text[i] == '(' || text[i] == ')' || (includeDots && (text[i] == '.'))))
{
RecordError(ErrorCode.InvalidCharacter, "Invalid character detected at position " + i.ToString() + " in '" + text + "'");
string res = text;
text = string.Empty; // need to end line
return res;
}
// end of what was added
string word = text.Substring(0, i);
text = text.Substring(i);
if (trimText)
{
text = text.Trim();
}
return word;
}
Btw, I am not sure if char.IsWhiteSpace
is a correct function because it says it assumer UTF-16 and I can't tell if the string is utf-8 or utf-16...
Problem If the game project is configured into Unicode mode, and there's a non-latin character met in plain script (except comments or character string literals in single or double quotes respectively), then the script preprocessor would stuck in an infinite loop while trying to process the scripts.
Steps to reproduce
Expected behavior I suppose it's expected that preprocessor should produce a warning or error about invalid character found in script.
Additional That would be a good idea to also test compilers with the same case, as both old and new one were written back when scripts were strictly ASCII.
On a side note, we might plan adding a "Abort/Cancel" button to a BusyDialog that runs a working process. It's weird how this was not featured from the start.