Open niktesic opened 1 year ago
Some updates from the current Proof-of-concept investigation.
There are several different scenarios to take into consideration:
I've made a certain progress with the basic test case, which corresponds to the register passed parameter, one level of nesting and blame line inside the call.
About developed mechanisms:
init
function, in the test case)With those changes the basic case is covered, and the TaintDFG looks like:
The second case to consider is when one parameter is used to set the value of the other, like in the test below (function fun
):
void crash(int val, int* adr){
*adr = val; // crash - line 3
}
void fun(int** ptr, int* adr)
{
*ptr = adr; // wrong blame - line 8
}
int main(){
int *p = 0; // wrong blame - line 12
int *adr = 3; // correct blame - line 13
fun(&p, adr);
crash(1, p);
return 0;
}
In this case we can use the ForwardTaintAnalysis to track the parameter to the "set point" (*ptr = adr;
), which corresponds to the following MIR instruction:
MOV64mr $rax, 1, $noreg, 0, $noreg, $rcx, debug-location !DILocation(line: 8
From this point we can perform the existing Backwards TaintAnalysis to track the value of parameter adr
(from $rcx
).
In that way the resulting TaintDFG looks like:
However, currently, TaintDataFlowGraph analysis fails to find the correct blame node from the graph, which is yet to be investigated.
Currently, the Taint Analysis is not performed for functions out of the backtrace, unless, one of the two conditions is met (from TaintAnalysis::shouldAnalyzeCall:): 1) Return value of the function is in the Taint List (return value register is in the T.L. or it is a base register of memory location from the T.L.) 2) A global variable is in the Taint List (Taint Info with no register operand, but with offset is in the T.L.) Those two conditions need to be revisited to meet real case scenarios and for condition 2) the future support of global variable tracking would be beneficial.
On the other hand, in many real cases, the parameter is passed as a reference (pointer) and its value is set in the functions out of the backtrace, but we don't have mechanism to detect such cases an to perform Taint Analysis for such functions.
With the patch below, we are able to run Taint Analysis on each function out of the backtrace, by selecting argument
-analyze-each-call
. This could be used during investigation to inspect how Taint Analysis could be performed on such functions, but in real cases, it could cause explosion of analysis. Patch: analyze-each-call.patchPlease, consider the following test case:
Although, argument
-analyze-each-call
is used and we are be able to analyze functioninit()
, which is responsible for setting incorrect value of the pointer, the tool is not able to find correct blame line. The main reason is the fact that we don't have available register values for the frames out of the backtrace, so we cannot rely on concrete memory addresses. This means, that functions out of the backtrace are analyzed on symbolic level, so we need to match exact registers and offsets.To sum up, there are two mechanisms which need to be developed: 1) Mechanism which will determine when a parameter is a reference to the tainted location, where we should analyze the call 2) Mechanism to efficiently perform analysis of functions out of the backtrace, without available register values (using memory to find needed values and improving symbolic level analysis)