Open montyly opened 1 year ago
@montyly @0xalpharush I am planning to start implement this context sensitivity for data dependency. How should I proceed with that? Any specific process to follow here?
I would implement it as a separate analysis and API for now, and we can decide whether to altogether replace the existing analysis or allow the detector author to choose one or another.
The first step would be to create an API for interprocedural control flow graph (CFG) as we currently only have an intraprocedural CFG. Some of the call-graph code may inspire the ICFG, but it does not correctly hand free functions currently (see https://github.com/crytic/slither/pull/1763).
We should create an edge in for each call-site and return aka "call and return linkages" in the CFG. For return linkages, we'd need to create a virtual exit in the CFG similar to how we have a virtual entry point. All nodes should have a path to the exit node. For example, we'd have a the following ICFG for the example in the issue:
Right now, the taints for InternalCall
are created by adding a entry in the Phi node with the arguments passed at each call-site here:
https://github.com/crytic/slither/blob/13d7d9f66a6be4f798478fa3735fb63444b46c3d/slither/core/compilation_unit.py#L197-L207
This means f's value of x is either paramA
or paramB
and the taint propagates both i.e. the union.
To make this context sensitive, instead we would have the data dependency analysis lookup what are the assignments to the value of x in the calling context. That is, calls to f
in test1
's context should only consider paramA
, f
in test2
's context should only consider paramB
. This should also be reflected in the return value. To implement this, we can use the "call strings approach" to inter-procedural analysis. In practice, this would mean we keep in the data dependency context
of f
a second dictionary with test1
and test2
as keys. The key is a stack of yet-to-be-returned function calls i.e. f
is invoked before test1
and test2
have completed executing. TBD what depth we should store for call strings.
For params, we'd push the caller on to the stack and initialize the parameters in the entry block to the call value and expect the following dependency:
f = Function(f)
f.context["x"] ["test1"]= LocalVariable("paramA")
f.context["x"] ["test2"] = LocalVariable("paramB")
For returns, we'd create a fresh variable, result_uuid
, and change the return value into an assignment to this unique result variable. Returning would pop from the stack and for external functions the stack would be E
, for empty. We'd expect the following dependency:
test1 = Function(test1)
test1.context["a"] ["E"] = LocalVariable("paramA") # transitive through result_test1
test2 = Function(test2)
test2.context["a"] ["E"] = LocalVariable("paramB") # transitive through result_test2
We would need to compute the transitive closure for each calling context individually. The API is_dependent
would need to accept another argument to allow detector authors to specify which call-site(s) to consider i.e. "test1" or "test2".
These lectures notes may prove useful and less abstract than the aforementioned paper: https://cs.au.dk/~amoeller/spa/7-interprocedural-analysis.pdf
@0xalpharush Thanks a lot for the detailed explanation. I think I get the essence of the idea. I will still go over the "call strings approach" and the corresponding lecture notes to jot down the concrete steps.
Right now the data dependency is context insensitive, which creates a large over approximation.
For example in
Slither will merge all the deps related to the call to
f(x)
when looking at the contract context. As a result, a dependency betweena
andparamB
(orb
andparamA
) will be created, because the the analysis will merge all the callers off
:Having a context-sensitive analysis will lead to bette results. This is also a recurring issues with top-level functions - which tend to be called from a lot of different contexts.
Moving toward a context sensitive analysis will have an impact on the performance. We could propose the two options (sensitive/insensitive), and allow the user to enable one or the other.
Additionally we should take the opportunity to refactor the data dependency to better support the switch between the context and the different source type: https://github.com/crytic/slither/blob/26659c4e0555c20eca037945aaec76b7639b00a7/slither/analyses/data_dependency/data_dependency.py#L47-L50
https://github.com/crytic/slither/blob/26659c4e0555c20eca037945aaec76b7639b00a7/slither/analyses/data_dependency/data_dependency.py#L62-L63