checkedc / checkedc-llvm-project

This was a fork of Checked C clang used from 2021-2024. The changes have been merged into the original Checked C clang repo, which is now at https://github.com/checkedc/checkedc-clang.
https://www.checkedc.org
13 stars 19 forks source link

Maintain lexical depth invariant for type variables properly. #583

Open secure-sw-dev-bot opened 2 years ago

secure-sw-dev-bot commented 2 years ago

This issue was copied from https://github.com/microsoft/checkedc-clang/issues/584


When comparing type variables, we use their lexical depths within _For_any quantifiers and their argument position for comparisons. For example, given

_For_any(T) int f(_Ptr<_For_any(S) int (_Array_ptr<S>, size_t size)> sort) {
 …
}

T has a lexical depth of 1 and S has a lexical depth of 2. This means T and S should compare differently.

In some cases, we aren't currently maintaining lexical depths properly. First, lexical depth counts from 0, not from 1. More importantly, if a generic function itself uses generic functions, we'll get the lexical depth wrong. This could lead to typechecking errors. Consider:

_Ptr<_For_any(S) int (_Array_ptr<S>, size_t size)> sort;
_For_any(T) int f(void) {
  _Ptr<_For_any(U) int (_Array_ptr<U>, size_t size)> gen_sort = sort;

(we would need a comparison function argument too, which I'm omitting too keep the example simple).

In this case, U has a lexical depth of 2. However, we're not adjusting lexical depth for the type of sort wen it is used as the right-hand of the assignment. The use of sort will have a type where S has a lexical depth of 1, causing a type mismatch.

In general, when we

  1. Use a variable defined outside the scope of a _For_any in the scope of a _For_any
  2. And the variable has a generic type (i.e. the type quantifies over a type variable),

we may need to adjust the generic type so that the quantified type variable has the proper lexical depth. The depth calculation needs to take into account depth within types and expressions.

@prabhuk, this issue covers the problem I mentioned earlier today.

Note that clang's function scope depth is different from the definition of depth that we are proposing for type variables. It's not true lexical depth and there's no adjustment to properly track lexical depth. It actually only tracks depth within function parameter lists.

By the way, for people interested in deBruijn indices, with deBruijn indices we wouldn't have to renumber the quantified type variables at variable uses. However, we'd have multiple representations of a type variable, which could make the IR harder to understand.

With deBruijn indices, we count from the use of a type variable up to the scope that binds it. In this modified version of the first example the use of S would have an index of 0 and the use of _Ptr would have an index of 1:


_For_any(T) int f(_Ptr<_For_any(S) int (_Array_ptr<S>, size_t size, _Ptr<T> arg)> sort) {
 …
}

Within a compiler, deBruin indices would save work at uses of polymorphic variables.  However, the implementation work at type applications seems to remain about the same. With deBruijn indices, we don't have to renumber quantified variables, but we do have to renumber free type variables.  it is clear that deBruijn indices are hard to read, which is why we aren't using them.
secure-sw-dev-bot commented 2 years ago

Comment from @abeln:

@dtarditi what would the definition of a type variable look like with de Bruijn indices?

e.g. right now

_For_any(T) T *id(T *x) { return x; }

becomes

typedef TypeVariablType(0, 0) T
T *id(T* x) { return x; }

and all uses of T can refer to it as TypeVariableType(0, 0).

However, if we switch to de Bruijn indices, then we can't say typedef TypeVariableType(0, 0) T, because the index of T can potentially change across multiple references:

e.g.

_For_any(T) T *id(T *x) {
  _For_any(U) U *id2(T *y) { return y; } // the type of `y` should be `TypeVariableType(1, 0)`
  id2<T>(x); // the type argument to id2 should be `TypeVariableType(0, 0)`
}

So how would we desugar _For_any(T) here with a typedef?

Any ideas?

secure-sw-dev-bot commented 2 years ago

Comment from @abeln:

It also seems like de Bruijn indices violate the unicity of pointers that clang asumes for types. This is because the same type variable 'T' potentially needs to be referred to with different de Bruijn indices (depending on the number of bindings between the definition and use of T). This means that we'll end up with two pointers to the same underlying type, which violates unicity.

secure-sw-dev-bot commented 2 years ago

Comment from @abeln:

For existential types, one solution is to continue using levels for type variables as we do right now, but also introduce a new kind of type variable ExistentialTypeVariableType (abbreviated below as ETVT) that is used only for existentials.

And ETVT has only one parameter: depth (an int like a normal type variable), because existential variables only appear inside existential types, which can only bind one variable at a time.

e.g.

_Exists(T, IntSet<T>) => Exists(IntSet<ETVT(0)>)

this solves our problem because we won't confuse an existential variable with a regular variable:

_For_any(U) void foo() {
  struct Pair<U, U> p = ...
  pack struct _Exists(T, struct Pair<T, U>) absPair = p;
}

The type of 'absPair' above is Exists(Pair<ETVT<0>, TV<0, 0>>), so both kinds of variables are kept separate.

Whenever we wrap an existential with a new existential, we have to rename references to the existential variables:

(within foo)
pack struct _Exists(T, _Exists(U, struct Pair<T, U>)) absPair2 = absPair;

we'd represent the type of absPair2 as Exists(Exists(struct Pair<ETVT<1>, ETVT(0)>)). Notice how the first argument went from ETVT<0> to ETVT<1>.

What about unpack?

unpack (T, IntSet) ePair = absPair;

The type of absPair was Exists(Pair<ETVT<0>, TV<0>>). The type of ePair is Pair<TV<0, 0>, TV<1, 0>>. So an unpack operation replaces all instances of ETVT<0> by the user-introduced regular TV T.

secure-sw-dev-bot commented 2 years ago

Comment from @abeln:

Spoke with @dtarditi about this today and we agreed that moving to de Brown indices might introduce more problems than it helps resolve, so we'll stay with the current approach of tracking depth.

Further, I'll focus on the existential types and we'll revisit this bug later if it becomes hi-pri.