cisco-open / llvm-crash-analyzer

llvm crash analysis
Apache License 2.0
40 stars 17 forks source link

[TA] Introduce Taint Info dereference level #40

Closed niktesic closed 1 year ago

niktesic commented 1 year ago

It is generally hard to determine if the Taint should be propagated through a LEA instruction, which loads address, and is usually used for pointer dereferencing.

Please, find the following test from this PR:

 #include <stdio.h>
 int main() {
   int x = 5;
   int *p = &x;
   int **pp = &p;
   p = 1;
   printf("2 + x is %d\n",**pp + 2);
   return 0;
 }

In the presented case, the TaintAnalysis goes into two paths: (1) Leads to line p = 1, which is correct blame line, (2) Leads to line int x = 5, which is not correct blame line.

This patch introduces Dereference Level, as a field of a Taint Info. At the crash-start, TaintInfo has zero (0) DerefLevel, and when propagating the Taint, DerefLevel is changes based on the Machine Instruction: (a) For Store MIs, it is increased by 1 (b) For Load MIs, it is decreased by 1 (c) For others, it is not changed

At the end of the Analysis, we consider only blame Nodes that have zero Dereference level (example below). With this, we can determine if the constant is loaded into a pointer (1) or into a basic type location(2).

Fn main
*** MFProgramPointInfo ***
bb.0: 0
Blame Nodes:
!9{1; MOV32mi $rbp, 1, $noreg, -8, $noreg, 5; CONSTANT: 5; DEREF-LVL: 1}
Blame line: 4
!4{1; $eax = MOV32ri 1; CONSTANT: 1; DEREF-LVL: 0}
Blame line: 7

To specify dereference level of Taint Info from which we explicitly start Taint Analysis, use argument -start-taint-deref-lvl.

niktesic commented 1 year ago

This PR ports https://github.com/cisco-open/llvm-crash-analyzer/pull/23 to the llvm-15 based code.