Closed github-learning-lab[bot] closed 2 years ago
10_taint_tracking.ql
with the template below. Note the annotation path-problem
and the pattern used in the select
section. This pattern allows CodeQL to interpret these results as a "path" through the code, and display the path in your IDE.NetworkByteSwap
class from step 9.isSource
predicate. This should recognize an expression in an invocation of ntohl
, ntohs
or ntohll
.
NetworkByteSwap
class from step 9. Here we need to check that the source corresponds to a value that belongs to this class.<value> instanceof <myclass>
construct.source
variable is of type DataFlow::Node
, while your NetworkByteSwap
class is a subclass of Expr
, so we cannot just write source instanceof NetworkByteSwap
. (Try this and the compiler will give you an error.) Use auto-completion on source
to discover the predicate that lets us view it as an Expr
.isSink
predicate: The sink should be the size argument of calls to memcpy
.
n
th argument of a function call.isSource
to view the sink
as an Expr
.Submit your query when you're happy with the results.
Tip: For a complete example, read this article.
/**
* @kind path-problem
*/
import cpp
import semmle.code.cpp.dataflow.TaintTracking
import DataFlow::PathGraph
class NetworkByteSwap extends Expr {
// TODO: copy from previous step
}
class Config extends TaintTracking::Configuration {
Config() { this = "NetworkToMemFuncLength" }
override predicate isSource(DataFlow::Node source) {
// TODO
}
override predicate isSink(DataFlow::Node sink) {
// TODO
}
}
from Config cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink, source, sink, "Network byte swap flows to memcpy"
Ooops! The query you submitted in 910fdcc684fa3856f278bf843ef4d940ce2ca31f didn't find the right results. Please take a look at the comment and try again.
To submit a new iteration of your query, you just have to push a new commit to the same branch (main
or the PR branch).
Congratulations, you have finished the course! You can merge your last outstanding Pull Request if you have one. Don't hesitate to give us feedback; find us at https://securitylab.github.com/get-involved. And recommend this course to your friends if it was useful!
Step 10: Data flow and taint tracking analysis
Great! You made it to the final step!
In step 9 we found expressions in the source code that are likely to have integers supplied from remote input, because they are being processed with invocations of
ntoh
,ntohll
, orntohs
. These can be considered sources of remote input.In step 6 we found calls to
memcpy
. These calls can be unsafe when their length arguments are controlled by a remote user. Their length arguments can be considered sinks: they should not receive user-controlled values without further validation.Combining these pieces of information, we know that code is vulnerable if tainted data flows from a network integer source to a sink in the length argument of a
memcpy
call.However, how do we know whether data from a particular source might reach a particular sink? This is known as data flow or taint tracking analysis. Given the number of results (hundreds of
memcpy
calls and a large number of macro invocations), it would be quite a lot of work to triage all these cases manually.To make our triaging job easier, we will have CodeQL do this analysis for us.
You will now write a query to track the flow of tainted data from network-controlled integers to the
memcpy
length argument. As a result you will find 9 real vulnerabilities!To achieve this, we’ll use the CodeQL taint tracking library. This library allows you to describe sources and sinks, and its predicate
hasFlowPath
holds true when tainted data from a given source flows to a sink.