awslabs / ar-go-tools

ar-go-tools (Argot) is a collection of analysis tools for Go
Apache License 2.0
5 stars 1 forks source link

Add escape interfaces support #33

Closed amzn-jasonrk closed 8 months ago

amzn-jasonrk commented 9 months ago

This PR adds support for interfaces to the escape analysis. In particular, it gives the ability to separately track different implementations of an interface, and when e.g. a method call or type assertion occurs, only apply the changes to nodes of the correct concrete type. This prevents some spurious edges from being created because e.g. field nodes are attached to structs of the correct type.

Example:

Lets say we are analyzing:

type DoAction interface { Action(*Node) }
type DoerA struct {a *Node}
type DoerB struct {a *Node}
func (a *DoerA) Action(n *Node) {
  a.a = n
}
func (b *DoerB) Action(n *Node) {
}
func doActionOnArgument(d DoAction, n *Node) {
  d.Action(n)
}

Here, when we look at doActionOnArgument, we won't know which specific DoerA/B we get. Previously, we would just use one node for whatever the actually object is, and we would conflate the a fields of the two distinct struct types. After this change, what occurs is that the pointee of d is marked as having an "abstract type" (implementation of interface). When the method call d.Action(n) is analyzed, the pointer analysis produces two possible methods (DoerA.Action, DoerB.Action). Each of these methods will on-demand create a subnode of the pointee of d with its respective concrete type, and only apply field writes to that subnode. The summary for doActionOnArgument will look like:

image

The important part is that the *d node has two subnodes (labeled "impl: ..."), one for each type. Only the A side actually has a pointer that alias n, whereas B does not (note the field is created lazily, so DoerB still has an a field, it just isn't access and so isn't shown). Now, when doActionOnArgument is called, if it is given a specific type, such as type A, then the correct subnode is linked up, and the calling context will be able to get the right edges (n if A, and no edges if B).

In terms of code, this PR mostly changes how function calls work, by being more judicious about how to link up nodes from the callee summary into the state of the caller. In particular, there are a few cases that need to be handled correctly:

  1. Callee node is concrete, caller is abstract. This occurs for receivers of methods, when they are called through an interface (not directly). We need to ensure the callee is only linked up with the subnode of the right concrete type.
  2. Callee node is abstract, caller is abstract. We link up corresponding subnodes of the same concrete types.
  3. Callee node is abstract, caller is concrete. This corresponds to calling a function that takes an interface on a specific type. We need to match up the right subnode in the callee with the concrete node in the calller.
  4. Callee and caller are both concrete. This occurs when e.g. calling a method on a value that aliases multiple nodes. If we're looking at the receiver and the types don't match, then we don't link the nodes. If they don't match and we aren't looking at a receiver, we link them anyway even though the result isn't necessarily typesafe. We most likely want to preserve this behavior because Go allows pointers to be cast in weird ways, and we don't want to lose track of a reference.

In addition, this PR adds a few new "pseudo-types", to represent the pointees of types that Go only lets the programmer talk about as references. (See NillableDerefType in escape.go) These new types are used to label interface nodes that we don't know the concrete type of, but can also give a more accurate type to nodes representing maps and channels. These types are dangerous, however, as they are obviously not recognized by the types package and cause panics. Their use should be strictly limited to cases where we absolutely need to label things beyond what the Go type system supports (i.e. in the escape package).

This adds some (currently skipped) tests for the standard library for printf and reflection. These require special handling, mostly because reflection dives below the Go source level and so needs hand-written summaries.

This change does not have as big of an effect on large scale programs as desired, so additional techniques will be required to achieve scalability.