CANVE / extractor

Extracts and normalizes the type relationships and call graph of scala sbt projects.
4 stars 1 forks source link

Add type parameters to the data model and extraction process #21

Open matanox opened 8 years ago

matanox commented 8 years ago

This feature is deferred to the next backend push, as it complicates the data model, and to an extent also the display, relative to its current simplicity.

Currently, types are extracted in/for relationships like is of type and extends, and naturally if a symbol is a class/object/trait it will be marked as one. This allows for a very simple data model ― a graph of symbols. That however, barely scratches the surface of type relationships in a program.

Scala programs are full of type parameters ― things like Seq[Int], SimpleGraph[SymbolCompilerId, ExtractedSymbol, ExtractedSymbolRelation] drawing from our own code, or def addObject[T <: Product: ClassTag] (ignore the class tag there, we don't aim to cover that). Which are the essence of scala's elaborate type system. A method can have type parameters, as just shown, and a type itself often has parameters, e.g. in Seq[Seq[Int]] the first Seq has a parameter Seq, which in turn has a parameters Int.

In the current implementation, which ignores this level of subtlety, we will only record that val foo: Seq[A[B]] is of type Seq. This is clearly insufficient as it will not capture the type parameters which A relies on!

The type information of each symbol can be actually seen as a tree/graph of types, where the relations are two:

E.g. see in this example:

Class C[T1[T2 <: T3] <: T4]

we have the relationships C has type parameter T1 T1 has type parameter T2 T1 has type bound T4 and so forth... each node in this type graph can have both a type parameter of its own and a type bound (or two, lower and higher). Also imagine T4 is T1, this is also allowed, for a type to have itself as a type parameter (e.g. mundaenly Seq[Seq[Foo]).

Where do these trees spring into the logical model? ordered hypergraphs may jump into mind, but are not semantically appropriate or worth the implementation. One option is to make every symbol a nested graph which contains all this type information. The other is to hinge the type information on the edge ― in edges that indicate being a type (is of type, extends, and a new needed one ― has type parameter). In this vein, the edge has a property, which in turn is a "type tree" which is composed of nodes of the general form, which is of course recursive:

class Type {
  val lowerBound: Option[Node]
  val upperBound: Option[Node]
  val typeParams: Seq[Type]
}

The tree says that "Foo extends Bar" with type parameters tree T, which preserves all the information that can be extracted. This option means we now have a data model which comprises:

A tree of Symbols connected by Relations ― with small trees of type parameters attached to some edges.

In this vein, a display may show the type parameters bouncing out a synthetic display node on the edge, connecting each type on it to the already visible instance of each type in the type tree, or showing those instances otherwise for the first time. So this option introduces a new element to the display interaction, not just the data model. Which is why this is being written here rather than being implemented, at this time ― complicating the model and display interaction may well just slow down development on all fronts at this time.

Second model:

Each node has a type tree, for the case of "being of a type", and "extension" still requires its own model element:

class Extension {
  val extends: Type
  extensionTypeParams: Seq[Type]
}

class Type {
  val extends: Option[Extension]
  val lowerBound: Option[Node]
  val upperBound: Option[Node]
  val typeParams: Seq[Type]
}

// `extends` and (`lowerBound`, `higherBound`) are mutually exclusive
// but not `typeParams` ― a class can have its own type params, and extend another one at the same time.

Perhaps a key distinction to make is that these type parameter relationships bear somewhat more local or context-dependent semantics than other relationships: in the type tree, if T1 is a type parameter of T2 somewhere down the type tree of some variable v, this only holds in the context of v, the variable to which the type tree is describing; it does not hold in general. This is unlike virtually all other aspects of the (current) model, e.g. if a calls b, this is a general statement that is always true for a and b. But T2 gets type parameter T1 only within the type tree of some variable v, not "in general". Does this have bearing on the data model? or is it only a concern of traversals on the data?

matanox commented 8 years ago

The simple-graph library allows more than a single node type (just need to have all node types extend a common root, and then always check the type before inspecting a node's values). And it does not preclude a small graph hinging to an edge.

We ultimately have to steer the choice of data model also by how will each option be traversed for anticipated display and search purposes. And it certainly adds a lot of information being extracted, so need to be ready to re-design when to show what, in the display, with this additional information being available, which is very good, but also touches the core of the visualization interaction scheme!

matanox commented 8 years ago

the compiler api:

The compiler api provides recursive access to the type parameters and/or type bounds of any type. The api handles are: .typeParams, and .tpe.bounds.

So for example, the root of the type parameters and bounds tree for a type def or method is obtained as follows:

ase Template(parents, self, body) =>

  val typeSymbol = tree.tpe.typeSymbol

  extractedModel.add(global)(typeSymbol)

  typeSymbol.typeParams.foreach { tparam =>
    extractedModel.add(global)(tparam)
    extractedModel.addIfUnique(typeSymbol.id, "has type parameter", tparam.id)

    val TypeBounds(lower, higher) = tparam.tpe.bounds
    if (higher.toString != "Nothing")
      extractedModel.add(global)(higher.typeSymbol)
      extractedModel.addIfUnique(tparam.id, "has higher type bound", higher.typeSymbol.id)
    if (lower.toString != "Nothing")
      println("foo")
      extractedModel.add(global)(lower.typeSymbol)
      extractedModel.addIfUnique(tparam.id, "has lower type bound", lower.typeSymbol.id)
  }

Sometimes you need to use .tpe to jump from a tree/symbol, to its associated Type, then to the Symbol that represents the type via .typeSymbol. A little confusing so better spike a validation of the compiler's model before considering whether we mimic that model, or take a different appraoch.