Open tylergu opened 5 days ago
Does using SsaWithFields help?
from SsaWithFields ssaf
select ssaf, ssaf.getBaseVariable(), ssaf.getAUse()
This for example tells us that the dataflow node foo.subtype.id
is a read of .subtype.id
.
The tricky thing about walking flow paths and trying to accumulate the field-accesses seen along them is twofold: first there can be a large number of possible paths for a simple-seeming program, and second that there can be field-accesses that don't match how struct types are laid out. For example, if the program went
foo := MyType{subtype: MySubType{name: "myName", id: 1}}
bar := OtherType(otherField: foo.name.id}
baz := bar.otherField
Then the sequence of field accesses along the path from foo
to baz
is name, id, otherField
, which of course doesn't match how the static struct types are laid out.
Could you perhaps give some other examples of what you'd want your query to return in this sort of case?
Thanks for the prompt reply! @smowton
Then the sequence of field accesses along the path from foo to baz is name, id, otherField, which of course doesn't match how the static struct types are laid out.
Yeah, we would expect .subtype.id
in the case you mentioned. Perhaps we can get away by setting field write as the barrier. I think in our use cases such scenarios rarely happen.
Thanks for pointing us the direction!
from SsaWithFields ssaf
select ssaf, ssaf.getBaseVariable(), ssaf.getAUse()
Though one struggle we have is to separate different paths.
For example, in the following query, we want two separate paths: .subtype.id
and .subtype.name
foo := MyType{subtype: MySubType{name: "myName", id: 1}}
bar := foo.subtype.id
baz := foo.subtype.name
The path-problem
seems to be able to automatically group the nodes on the same path, but we did not find a good way to do that in the query itself.
Any insight would be greatly appreciated!
How about doing something like:
import go
Expr getAStructLiteralValue(StructLit e) {
exists(Expr element | element = e.getAnElement() |
if element instanceof KeyValueExpr
then result = element.(KeyValueExpr).getValue()
else result = element
)
}
string describeStructLiteralValue(Expr e) {
exists(Write w, Field f, DataFlow::Node base, DataFlow::Node value |
w.writesField(base, f, value) and value.asExpr() = e |
result = describeStructLiteralValue(base.asExpr()) + "." + f.getName()
or
result = f.getName()
)
}
string longestStuctLiteralValueDescription(Expr e) {
result = max(string s | s = describeStructLiteralValue(e) | s order by s.length())
}
module Blah implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node n) {
n.asExpr() = getAStructLiteralValue(_) and not n.asExpr() instanceof StructLit
}
predicate isSink(DataFlow::Node n) { any() }
}
module Flow = DataFlow::Global<Blah>;
from Flow::PathNode source, Flow::PathNode sink
where Flow::flowPath(source, sink)
and source != sink
select source, sink, longestStuctLiteralValueDescription(source.getNode().asExpr())
Note that by making the sources struct-literal members that are not themselves struct literals (i.e., "myName"
and 1
), and by using DataFlow instead of TaintTracking, we require that to find a result, there must have been a corresponding set of field accesses to extract the value. That is, you would not get a result for baz2 := foo.subtype
, because we haven't fully unpacked the source value "myName"
or 1
yet.
That means there's no need to look at the flow path explicitly -- we know if dataflow has found a result, the source has been fully extracted from its surrounding struct, so the field accesses required are exactly the fields it is nested within when the struct literal is created.
On the other hand if you want to highlight the places where the fields are accessed, then we need to do something different -- let me know if so.
Description of the issue
Use Case
We are trying to use the dataflow analysis to get the explicitly accessed fields in a struct in Golang. To do that, we use the TaintTracking module, with source being the root struct, and sink being any(). The background is that we are trying to do some static analysis for Kubernetes controllers, which define complex nested struct types (e.g. RabbitMQ operator).
The taint analysis works like a charm. However, we are struggling to compute the path for the sink nodes. To get the struct field path, we need to get each DataFlow Node on the dataflow path from source to the sink, for all possible paths.
Example
Given the following Golang program, with a nested definition of MyType and MySubType:
our objective is given taint source
foo
, we want to reach to sinkbar
(this works well), and get the struct field pathsubtype.id
.Is there a clean way to get all field selection nodes on the path given the source and the sink?
Thanks a lot!