Open ivan-mogilko opened 1 month ago
there may be a "compressed" name generated as min number of characters enough to distinguish the type, maybe starting with 3 letters (unless the type is shorter). And then this type name "shortcut" is also saved somewhere, like in RTTI table, as a way to reference a type, in case we may need to quickly find that type's entry.
My first idea was using symbol table entry numbers, but just using symbol table entry numbers to identify the types won't be enough. The issue is, a struct foo
could be symbol table entry 271 when a certain program is compiled that exports some function, and the same struct foo
might be a completely different symbol table entry, e.g., 721, when a different program is compiled that imports that function.
Instead, we'd probably need to stringify the types in a unique way and use some sort of concatenation of this for the mangled function name. One could imagine mangled function names such as
fname^param_count^stringified_type_of_first_param^stringified_type_of_second_param^…
That might make for very long mangled function names, but as far as I know, other ecosystems such as GNU's gcc
also have very long mangled function names.
To stringify a type in a unique way, as a first approach we might linearize its composition of data components (and ignore the attributes and function components)
i
for integer. t
could be stringified as *T
, where T
is the stringification of t
t
could be stringified as T[]
t
, u
, v
could be stringified as (TUV)
If you've already GOT a stringification for types as part of your RTTI scheme, let's take that.
If the mangled function names become too long, we might hash them so that we can look them up first by hash, then by exact name. The hashing only needs to occur once, at the linking stage, so this will probably not slow down program execution.
The issue is, a struct foo could be symbol table entry 271 when a certain program is compiled that exports some function, and the same struct foo might be a completely different symbol table entry, e.g., 721, when a different program is compiled that imports that function. <...> If you've already GOT a stringification for types as part of your RTTI scheme, let's take that.
So, in RTTI there are local tables with local typeids, and a joint table created at runtime, which assigns global numeric ids to the types. These tables are binded using a fully qualified type name (string). This is similar to how function fixups is done across scripts.
I suppose that if you will use numeric local type ids in this mangled name, then it will be possible to resolve them into global type ids (whether numeric or string) at the script linking stage in the engine.
As for string names, RTTI currently does not have any "shortcut" names, only full names. It's possible to add "shortcuts" there though, generated using your proposed "stringification" rules, if using numeric typeids does seem inconvenient.
EDIT: But, I'd like to clarify, that supposedly using RTTI here is not exactly required to make these functions link, is it? Function linking is still going to be done with the use of import/export tables and fixups. RTTI will only be required if some operation would need to know exact types of function args at runtime.
Hm. We might keep things simple. We don't need RTTI at link time, but when RTTI already has a way to name types uniquely in a string (long names), then we could use just this naming mechanism and name the functions with these long names, too. That is,
fname^parameter_count^long_type_name1^long_type_name2^long_type_name3…
That makes for long function names, but AFAIK they are only used for linking and aren't usually shown to the programmer. The engine would
fname^parameter_count
when there isn't a full mangled function name,fname
as a last resort. It seems that the engine does 2. and 3. already, so this might be a comparatively simple modification of pre-existing code.
In communication with the programmer, IMO they need to be told what they need to be told, it can't be helped. So when there are several functions of the same name that have different parameter lists and an error message needs to refer to one specific function, it would call the function fname(int, float)
or some such. In simple cases that can't be misunderstood, the error messages could stil refer to fname()
if that is simpler to understand.
Are bool and enums assumed to be int in this overload idea? Or would they be their own stuff?
I am curious if this put new overhead in script function calls at runtime or if these would be figured out at compile time.
Just to expand a bit, assuming we do have it and let's we use it in Maths in the engine API to also support int, I imagine in the ags manual we would have all the overloaded methods in the same entry.
Are bool and enums assumed to be int in this overload idea?
Currently, the compiler doesn't know bool
, it doesn't even see it. There are #defines
in the autoheader that equate bool
to int
, true
to 1
and false
to 0
.
As concerns enum
, I think the language follows the old C
conventions that essentially treat enum
s as int
that have some compile-time constants. I'm not sure of the ramifications. This might confuse a user when they have defined a function as, e.g., fname(Enum i)
(where Enum
is an enum) and the compiler tells them about fname(int)
in an error message.
We don't have casts in the AGS language, neither the C++ casts nor the C casts. So the language kind of relies on being able to assign ints to enums and vice versa,
In other words, in order to distinguish overloads with bools and different enums, the compiler would have to register bool
and enums as distinct types. This sounds like a separate issue of its own though.
I am curious if this put new overhead in script function calls at runtime or if these would be figured out at compile time.
Function overloads are resolved at compile and linking time.
The proposal is to support script function overloading.
CC @fernewelten
Function overloading means that you may have multiple functions of identical name, but different prototype (return value and parameter list). For example:
NOTE: overloading must have different argument list, it cannot support function variants that only differ in return type, because there will be no way to tell which of those variants is being called.
In order to support this, function variants must be distinguished on both compilation and linking stages. In other words, each function variant must be registered under a unique internal name. Right now AGS uses a "FUNC^N" notation for distinguishing imports with different number of parameters (and afaik "FUNC$N" as a corresponding export name). This was done primarily to let link deprecated API functions in the engine (i think). But number of parameters is not enough for overloading, as we would also need to differentiate variants with different return and argument types.
The first idea that comes to mind is to generate a second suffix which contains encoded parameter types. Note that they do not exactly have to be uniquely identified throughout the script or game: for the purpose of overloading itself having different suffixes is enough. But it may be still beneficial to have a strict rule for these, i.e. not a random garbage, at least because this may be useful for debugging. And there may also be additional uses found later, so it would be best to not block this opportunity.
Now, this is where this becomes bit complicated. I may imagine that primitive types such as ints, floats, etc, could be identified by a single letter, like
i
,f
, etc, but what about others? Having a single letter will not be suitable, having full type name may make this internal name quite long.As a random idea there may be a "compressed" name generated as min number of characters enough to distinguish the type, maybe starting with 3 letters (unless the type is shorter). And then this type name "shortcut" is also saved somewhere, like in RTTI table, as a way to reference a type, in case we may need to quickly find that type's entry.
Are there any other visible options here?