github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.67k stars 1.54k forks source link

Go: Tainted data does not propagate data flow after append function #14116

Open Jokuuy opened 1 year ago

Jokuuy commented 1 year ago

Go: v1.19.1 linux/amd64 CodeQL: v2.11.0

Question: Query the data flow results from the input parameter in the source function to the name parameter in the sink function.

package main

import (
    "fmt"
)

type info struct {
    name string
}
func main() {
    source("input")
}

func source(input string) {
    var data []info
    data = append(data, info{
        name: input,
    })
    dao(data)

}

func dao(data []info) {
    for _, item := range data {
        name := item.name
        sink(name)
    }
}

func sink(name string) {
    fmt.Println(name)
}

The following is my query statement. After execution, I did not get the expected results.

append_struct_result
/**
 * @kind path-problem
 */

import go
import DataFlow::PathGraph

class Configuration extends TaintTracking::Configuration {
  Configuration() { this = "Configuration" }

  override predicate isSource(DataFlow::Node source) {
    exists(Function func | func.getName() = "source" | func.getParameter(0) = source.asParameter())
  }

  override predicate isSink(DataFlow::Node sink) {
    exists(DataFlow::CallNode call | call.getTarget().getName() = "sink" |
      call.getArgument(0) = sink
    )
  }
}

from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Source: $@ , Sink: $@ .", source.getNode(), source.toString(),
  sink.getNode(), sink.toString()

When I debug using DataFlow::PartialPathGraph, it seems that the tainted data does not propagate the data flow after passing through the append function.

append_struct_partial
/**
 * @kind path-problem
 */

import go
import DataFlow::PartialPathGraph

class Configuration extends TaintTracking::Configuration {
  Configuration() { this = "Configuration" }

  override predicate isSource(DataFlow::Node source) {
    exists(Function func | func.getName() = "source" | func.getParameter(0) = source.asParameter())
  }

  override predicate isSink(DataFlow::Node sink) {
    exists(DataFlow::CallNode call | call.getTarget().getName() = "sink" |
      call.getArgument(0) = sink
    )
  }

  override int explorationLimit() { result = 5 }
}

from Configuration cfg, DataFlow::PartialPathNode source, DataFlow::PartialPathNode sink
where cfg.hasPartialFlow(source, sink, _)
select sink.getNode(), source, sink, "Source: $@ , Sink: $@ .", source.getNode(), source.toString(), sink.getNode(), sink.toString()

How to solve this problem?

MathiasVP commented 1 year ago

Hi @Jokuuy,

This is a tricky issue. I'll provide some background that explains what's going wrong, and end with two ways in which you can fix it yourself to get the flow you want. Finally, I'll add a couple of notes about what we're hopefully doing about this in the future for Go so that everything automagically works.

What you can do to get the flow you want

Let me give you the background first.

The dataflow library has two "kinds" of steps:

An example of a "value preserving step" is steps from the right-hand side of an assignment to the next use of the variable. e.g., the step from 42 to x (in return x) in something like:

var x = 42
return x

An example of a "non-value preserving step" could be a step from x to x + 1. More importantly for this issue: append is also a non-value preserving step.

When the dataflow is tracking flow is keeps track of which fields has been written to (so that we can propagate flow from a write to a field to a subsequent read of that field) using the concept of an "access path". You can think of an access path as a list of fields that have currently been written to along the dataflow path traversed so far. A dataflow path node is roughly the pair (n, ap), where n is a dataflow node and ap is an access path.

For example, let's say you want to track the flow from "hello" to the argument of print in something like:

type foo struct {
    a string
    b int
}

type bar struct {
    f foo
    c int
}

func main() {
  f := foo{}
  f.a = "hello"
  b := bar{}
  b.f = f
  f2 := b.f
  myString := f2.a
  print(myString)
}

After the line f.a = "hello" the access path is [a]. This represents that, in order to get to the data we're tracking, we need to read the field a. After the line b.f = f the access path is [f, a] which represents that in order to get to the data we're tracking, we first need to read the field f, and then we need to read the field a. After the line f2 := b.f we "pop" the first element of the access path (because we read a field that matches the head of the access path) and transform the access path to [a]. Finally, after the line myString := f2.a the access path is empty, which represents that the data we're tracking is exactly the value of myString.

There are many which in which a field can be written to. In your example, the field name has been written to when you constructed the info{name: input} value. As you can see from the partial path, CodeQL ends up tracking the struct literal along with the access path [name].

Now, here's the kicker: Non-value preserving steps are only taken into account when the access path is empty. Knowing this, it's hopefully clear why your flow is stopping at append because:

To fix this you either need to:

  1. ensure that the access path is empty at the point of the call to append, or
  2. hack the Go library to treat calls to append as value-preserving steps

To do 1., you can add an additional flow step to your configuration from the struct literal's name field to the struct literal.

To do 2., I believe you can change https://github.com/github/codeql/blob/main/go/ql/lib/semmle/go/frameworks/Stdlib.qll#L53 to extend DataFlow::FunctionModel instead of TaintTracking::FunctionModel.

I hope this helps! Let me know if you need more pointers 🙂.

What we'll hopefully do in the future:

The underlying issue is really the way we model append as a non-value preserving step in Go. In the future, we'd like to change the model so that it's a value-preserving step which reflects that, if tainted data is stored inside the list before the append, then tainted data is also stored inside the list after the append.