adventuregamestudio / ags

AGS editor and engine source code
Other
707 stars 159 forks source link

Compiler: port preprocessor updates from C# to Cpp #2511

Closed ericoporto closed 3 months ago

ericoporto commented 3 months ago

This is essentially https://github.com/adventuregamestudio/ags/commit/cc465235b5a20b2a5a377aa4140ee3cbbd6d66d8 (https://github.com/adventuregamestudio/ags/pull/2482) and https://github.com/adventuregamestudio/ags/commit/69704a20c7bb70e34609d6ec78d3549b2a2e8a9f (small part of https://github.com/adventuregamestudio/ags/pull/2180)

NOTE: I am modifying the IsScriptWordChar function which is used by the script compiler. It had an undefined behavior - the comment text is from cppreference website. It looks like it's still working but previous isalnum was returning true for non-latin characters, I don't think this breaks anything because in the script compiler this is used only once and we pass a char type variable instead of an int. I actually think this function should take a char as parameter since both existing uses are only passing a char.

ivan-mogilko commented 3 months ago

About IsScriptWordChar, somehow I did not realize this earlier, but this function is not correct for couple of reasons. First, as you say, it should accept char parameter, and second, using std::isalnum seems incorrect, because it may return positive for a number of extended characters, and also depends on current locale. But these are not suitable for a "script word".

The C# version of preprocessor looks more correct, where it does not use "is alphanumeric", but explicitly compares to the only valid range of chars that may be used in script code:

        public static bool IsScriptWordChar(this Char theChar)
        {
            return ((theChar >= 'A') && (theChar <= 'Z')) ||
                ((theChar >= 'a') && (theChar <= 'z')) ||
                ((theChar >= '0') && (theChar <= '9')) ||
                (theChar == '_');
        }

I added the use of std::isalnum in 30829df017404a95bd2cdb26986ab1225ea1bee7 , because the previous in-compiler function was called is_alphanum. Somehow it did not click in my head that it does not really mean to represent "alphanumeric" in a wide sense, but only in script code sense. So the previous implementation was a correct one, it was just named wrong.

ericoporto commented 3 months ago

I adjusted, but actually the previous implementation also supported two additional characters

...
if (chrac == '\"') return 1;
if (chrac == '\'') return 1;

I think those are incorrect, so I redid it exactly like the C# one.