Discussion on types and type name normalisation

Implementation Status

### Tasks
- [ ] https://github.com/Fraunhofer-AISEC/cpg/issues/1539
- [ ] https://github.com/Fraunhofer-AISEC/cpg/issues/1540
- [ ] Rework type resolution
- [ ] https://github.com/Fraunhofer-AISEC/cpg/issues/1535

Motivation

Note: In the following I will mainly use C++ code as a motivating factor, but this also affects other languages as well. However, C++ is the one language that has most of the fqn/namespace/scoping feature set of all the languages we support.

The current way we handle types has a major drawback: In order to save on the creation of Type nodes, we have a rather intricate system in the objectType function that checks whether a type already exists with the given name. So if you consider the following small C++ code:

class MyClass {
  MyClass(MyClass* other) {
    this->field = other->field;
  }
  int field;
};

Instead of having 2 different MyClass type nodes, we only end up with one; which is pretty good to save space. This also (sort of) works if we are in a simple namespace:

namespace awesome {
class MyClass {
  MyClass(MyClass* other) {
    this->field = other->field;
  }
  int field;
};
}

In this case, we already run into problems to differentiate the types and we need to use the scope manager information during the frontend translation (which we want to avoid and is only limited in scope) to retrieve the current namespace and then append it to the type. In this case we end up with 2 awesome::MyType nodes.

The problems continue for example if we have a non-qualified static call inside the namespace.

class OutsideClass {
  static doStatic() {}
};
namespace awesome {
class OtherClass {
  static doStatic() {}
};
class MyClass {
  MyClass() {}
  MyClass(MyClass* other) {
    this->field = other->field;

    // this works
    OutsideClass::doStatic();
    // this won't work
    OtherClass::doStatic();
  }
  int field;
};
}

Currently, in this case, type resolution for OtherClass fails because we did not prefix the static call with the current namespace.

Further problems arise if we want to properly support things like using in C++, which "imports" either a namespace of a symbol and we potentially need to look for several namespaces in order to find a match:

using namespace std;
int main() {
   // here, we only know that "string" is part of the `std` namespace, once we know all the types
   string s;
}

This is also comparable to python, where we can import everything from a package into the scope

from sys import *
print(argv)

Problems to tackle

Resolve "local" type names into fully qualified names
Make it possible to import (all) symbols from a namespace into a specific scope; probably a slightly different problem than using types. But it would be good to have a solution for all "symbols".
We need to provide aliasing for both types and symbols

Problem 1: Type FQN

A possible solution would be to only do a limited type parsing during frontend translation and perform a "type resolution" in a later pass (e.g. the TypeResolver or a separate one). In the examples above we would only parse MyClass as the type during the frontend and then later resolve its name to awesome::MyClass. This would also help with partial qualified name (as possible in C++, see https://github.com/Fraunhofer-AISEC/cpg/issues/1126 for details).

In order to do so, I propose a new class TypeReference. This class is a sort of hybrid between a Reference and a Type. It holds a refersTo to a declaration; or rather a specific subset of declarations that can declare types, but it derives from Type in order to support comparison to other types. We could use the same logic in resolving those types as the regular symbol resolver, because in the end even references to types are just symbols. This could make it necessary to make the type property an AST edge.

For us to differentiate which declarations can declare types a DeclaresType interface might be a good idea and refersTo could be limited to that interface.

The TypeReference would probably have two states:

unresolved. In this stage, it behaves like its own type. As for comparison to other types, it would probably be safe to say that two type references with the same name within the same scope are equal. Two references with the same name but different scopes are not equal, since the name could potentially mean a different symbol in a different scope
resolved. In this stage, the refersTo is set and we can point all type operations and attributes to the type that is declared by its DeclaresType node. This also would mean that in addition to being equal to the same name/scope type reference, it will also be equal to the declaring type itself. A challenge here could be that equals needs to be symmetric, so we would need to check on all other types if they are equal to a reference as well.

Problem 2: Symbol Importing

Currently, a Scope has a list of valueDeclarations and structureDeclarations. I propose instead to have a map of symbols, which hold a (local) name as key and contain a list of declarations that are available in this scope.

/**
 * A symbol is a simple, local name. It is valid within the scope that declares it and all of its
 * child scopes. However, a child scope can usually "shadow" a symbol of a higher scope.
 */
typealias Symbol = String

var symbols = mutableMapOf<Symbol, List<Declaration>>()

This would make it easier to lookup the declarations for a specific name within a scope. A simple lookup algorithm for name could look this:

Retrieve the list of declarations from the symbols map according to the name key
If the list is non-empty, we can directly return these names as candidates. Depending on the language (and whether we want to target a function or a variable), this result must be unique
If the list is empty, we go to the parent of the scope

We can have a new node type called ImportDeclaration that defines the import during frontend translation and then resolves the imports in a pass. We also need to resolve the imports of a symbol, that contains a ImportDeclaration. In order to do that we have a dedicated ImportResolver task that resolves the imported symbols of all ImportDeclartion nodes. This pass needs to be executed very early.

Problem 3: Aliases

Not sure yet. If we have the symbols from the previous approach, we could maybe have an AliasDeclaration node, that lives in the symbol map but points to the original declaration.

Fraunhofer-AISEC / cpg