Note: In the following I will mainly use C++ code as a motivating factor, but this also affects other languages as well. However, C++ is the one language that has most of the fqn/namespace/scoping feature set of all the languages we support.
The current way we handle types has a major drawback: In order to save on the creation of Type nodes, we have a rather intricate system in the objectType function that checks whether a type already exists with the given name. So if you consider the following small C++ code:
class MyClass {
MyClass(MyClass* other) {
this->field = other->field;
}
int field;
};
Instead of having 2 different MyClass type nodes, we only end up with one; which is pretty good to save space. This also (sort of) works if we are in a simple namespace:
namespace awesome {
class MyClass {
MyClass(MyClass* other) {
this->field = other->field;
}
int field;
};
}
In this case, we already run into problems to differentiate the types and we need to use the scope manager information during the frontend translation (which we want to avoid and is only limited in scope) to retrieve the current namespace and then append it to the type. In this case we end up with 2 awesome::MyType nodes.
The problems continue for example if we have a non-qualified static call inside the namespace.
class OutsideClass {
static doStatic() {}
};
namespace awesome {
class OtherClass {
static doStatic() {}
};
class MyClass {
MyClass() {}
MyClass(MyClass* other) {
this->field = other->field;
// this works
OutsideClass::doStatic();
// this won't work
OtherClass::doStatic();
}
int field;
};
}
Currently, in this case, type resolution for OtherClass fails because we did not prefix the static call with the current namespace.
Further problems arise if we want to properly support things like using in C++, which "imports" either a namespace of a symbol and we potentially need to look for several namespaces in order to find a match:
using namespace std;
int main() {
// here, we only know that "string" is part of the `std` namespace, once we know all the types
string s;
}
This is also comparable to python, where we can import everything from a package into the scope
from sys import *
print(argv)
Problems to tackle
Resolve "local" type names into fully qualified names
Make it possible to import (all) symbols from a namespace into a specific scope; probably a slightly different problem than using types. But it would be good to have a solution for all "symbols".
We need to provide aliasing for both types and symbols
Problem 1: Type FQN
A possible solution would be to only do a limited type parsing during frontend translation and perform a "type resolution" in a later pass (e.g. the TypeResolver or a separate one). In the examples above we would only parse MyClass as the type during the frontend and then later resolve its name to awesome::MyClass. This would also help with partial qualified name (as possible in C++, see https://github.com/Fraunhofer-AISEC/cpg/issues/1126 for details).
In order to do so, I propose a new class TypeReference. This class is a sort of hybrid between a Reference and a Type. It holds a refersTo to a declaration; or rather a specific subset of declarations that can declare types, but it derives from Type in order to support comparison to other types. We could use the same logic in resolving those types as the regular symbol resolver, because in the end even references to types are just symbols. This could make it necessary to make the type property an AST edge.
For us to differentiate which declarations can declare types a DeclaresType interface might be a good idea and refersTo could be limited to that interface.
The TypeReference would probably have two states:
unresolved. In this stage, it behaves like its own type. As for comparison to other types, it would probably be safe to say that two type references with the same name within the same scope are equal. Two references with the same name but different scopes are not equal, since the name could potentially mean a different symbol in a different scope
resolved. In this stage, the refersTo is set and we can point all type operations and attributes to the type that is declared by its DeclaresType node. This also would mean that in addition to being equal to the same name/scope type reference, it will also be equal to the declaring type itself. A challenge here could be that equals needs to be symmetric, so we would need to check on all other types if they are equal to a reference as well.
Problem 2: Symbol Importing
Currently, a Scope has a list of valueDeclarations and structureDeclarations. I propose instead to have a map of symbols, which hold a (local) name as key and contain a list of declarations that are available in this scope.
/**
* A symbol is a simple, local name. It is valid within the scope that declares it and all of its
* child scopes. However, a child scope can usually "shadow" a symbol of a higher scope.
*/
typealias Symbol = String
var symbols = mutableMapOf<Symbol, List<Declaration>>()
This would make it easier to lookup the declarations for a specific name within a scope. A simple lookup algorithm for name could look this:
Retrieve the list of declarations from the symbols map according to the name key
If the list is non-empty, we can directly return these names as candidates. Depending on the language (and whether we want to target a function or a variable), this result must be unique
If the list is empty, we go to the parent of the scope
We can have a new node type called ImportDeclaration that defines the import during frontend translation and then resolves the imports in a pass. We also need to resolve the imports of a symbol, that contains a ImportDeclaration. In order to do that we have a dedicated ImportResolver task that resolves the imported symbols of all ImportDeclartion nodes. This pass needs to be executed very early.
Problem 3: Aliases
Not sure yet. If we have the symbols from the previous approach, we could maybe have an AliasDeclaration node, that lives in the symbol map but points to the original declaration.
Implementation Status
Motivation
Note: In the following I will mainly use C++ code as a motivating factor, but this also affects other languages as well. However, C++ is the one language that has most of the fqn/namespace/scoping feature set of all the languages we support.
The current way we handle types has a major drawback: In order to save on the creation of
Type
nodes, we have a rather intricate system in theobjectType
function that checks whether a type already exists with the given name. So if you consider the following small C++ code:Instead of having 2 different
MyClass
type nodes, we only end up with one; which is pretty good to save space. This also (sort of) works if we are in a simple namespace:In this case, we already run into problems to differentiate the types and we need to use the scope manager information during the frontend translation (which we want to avoid and is only limited in scope) to retrieve the current namespace and then append it to the type. In this case we end up with 2
awesome::MyType
nodes.The problems continue for example if we have a non-qualified static call inside the namespace.
Currently, in this case, type resolution for
OtherClass
fails because we did not prefix the static call with the current namespace.Further problems arise if we want to properly support things like
using
in C++, which "imports" either a namespace of a symbol and we potentially need to look for several namespaces in order to find a match:This is also comparable to python, where we can import everything from a package into the scope
Problems to tackle
Problem 1: Type FQN
A possible solution would be to only do a limited type parsing during frontend translation and perform a "type resolution" in a later pass (e.g. the
TypeResolver
or a separate one). In the examples above we would only parseMyClass
as the type during the frontend and then later resolve its name toawesome::MyClass
. This would also help with partial qualified name (as possible in C++, see https://github.com/Fraunhofer-AISEC/cpg/issues/1126 for details).In order to do so, I propose a new class
TypeReference
. This class is a sort of hybrid between aReference
and aType
. It holds arefersTo
to a declaration; or rather a specific subset of declarations that can declare types, but it derives fromType
in order to support comparison to other types. We could use the same logic in resolving those types as the regular symbol resolver, because in the end even references to types are just symbols. This could make it necessary to make thetype
property anAST
edge.For us to differentiate which declarations can declare types a
DeclaresType
interface might be a good idea andrefersTo
could be limited to that interface.The
TypeReference
would probably have two states:unresolved
. In this stage, it behaves like its own type. As for comparison to other types, it would probably be safe to say that two type references with the same name within the same scope are equal. Two references with the same name but different scopes are not equal, since the name could potentially mean a different symbol in a different scoperesolved
. In this stage, therefersTo
is set and we can point all type operations and attributes to the type that is declared by itsDeclaresType
node. This also would mean that in addition to being equal to the same name/scope type reference, it will also be equal to the declaring type itself. A challenge here could be thatequals
needs to be symmetric, so we would need to check on all other types if they are equal to a reference as well.Problem 2: Symbol Importing
Currently, a
Scope
has a list ofvalueDeclarations
andstructureDeclarations
. I propose instead to have a map ofsymbols
, which hold a (local) name as key and contain a list of declarations that are available in this scope.This would make it easier to lookup the declarations for a specific name within a scope. A simple lookup algorithm for
name
could look this:symbols
map according to thename
keyparent
of the scopeWe can have a new node type called
ImportDeclaration
that defines the import during frontend translation and then resolves the imports in a pass. We also need to resolve the imports of a symbol, that contains aImportDeclaration
. In order to do that we have a dedicatedImportResolver
task that resolves the imported symbols of allImportDeclartion
nodes. This pass needs to be executed very early.Problem 3: Aliases
Not sure yet. If we have the symbols from the previous approach, we could maybe have an
AliasDeclaration
node, that lives in the symbol map but points to the original declaration.