Open Jokuuy opened 1 year ago
Hi @Jokuuy,
This is a tricky issue. I'll provide some background that explains what's going wrong, and end with two ways in which you can fix it yourself to get the flow you want. Finally, I'll add a couple of notes about what we're hopefully doing about this in the future for Go so that everything automagically works.
Let me give you the background first.
The dataflow library has two "kinds" of steps:
DataFlow::Configuration
.TaintTracking::Configuration
.An example of a "value preserving step" is steps from the right-hand side of an assignment to the next use of the variable. e.g., the step from 42
to x
(in return x
) in something like:
var x = 42
return x
An example of a "non-value preserving step" could be a step from x
to x + 1
. More importantly for this issue: append
is also a non-value preserving step.
When the dataflow is tracking flow is keeps track of which fields has been written to (so that we can propagate flow from a write to a field to a subsequent read of that field) using the concept of an "access path". You can think of an access path as a list of fields that have currently been written to along the dataflow path traversed so far. A dataflow path node is roughly the pair (n, ap), where n
is a dataflow node and ap
is an access path.
For example, let's say you want to track the flow from "hello" to the argument of print
in something like:
type foo struct {
a string
b int
}
type bar struct {
f foo
c int
}
func main() {
f := foo{}
f.a = "hello"
b := bar{}
b.f = f
f2 := b.f
myString := f2.a
print(myString)
}
After the line f.a = "hello"
the access path is [a]
. This represents that, in order to get to the data we're tracking, we need to read the field a
. After the line b.f = f
the access path is [f, a]
which represents that in order to get to the data we're tracking, we first need to read the field f
, and then we need to read the field a
. After the line f2 := b.f
we "pop" the first element of the access path (because we read a field that matches the head of the access path) and transform the access path to [a]
. Finally, after the line myString := f2.a
the access path is empty, which represents that the data we're tracking is exactly the value of myString
.
There are many which in which a field can be written to. In your example, the field name
has been written to when you constructed the info{name: input}
value. As you can see from the partial path, CodeQL ends up tracking the struct literal along with the access path [name]
.
Now, here's the kicker: Non-value preserving steps are only taken into account when the access path is empty. Knowing this, it's hopefully clear why your flow is stopping at append
because:
append
is a non-value preserving step, andTo fix this you either need to:
append
, orappend
as value-preserving stepsTo do 1., you can add an additional flow step to your configuration from the struct literal's name
field to the struct literal.
To do 2., I believe you can change https://github.com/github/codeql/blob/main/go/ql/lib/semmle/go/frameworks/Stdlib.qll#L53 to extend DataFlow::FunctionModel
instead of TaintTracking::FunctionModel
.
I hope this helps! Let me know if you need more pointers 🙂.
The underlying issue is really the way we model append
as a non-value preserving step in Go. In the future, we'd like to change the model so that it's a value-preserving step which reflects that, if tainted data is stored inside the list before the append, then tainted data is also stored inside the list after the append.
Go: v1.19.1 linux/amd64 CodeQL: v2.11.0
Question: Query the data flow results from the
input
parameter in thesource
function to thename
parameter in thesink
function.The following is my query statement. After execution, I did not get the expected results.
When I debug using
DataFlow::PartialPathGraph
, it seems that the tainted data does not propagate the data flow after passing through theappend
function.How to solve this problem?