Broken Value Flow when Passing Pointer through Parameter

seviezhou commented 1 year ago

There is a problem in value-flow construction. I tried to analyze the following code with the leak checker:

#include <stdio.h>

struct data_st {
    void *tmp_store;
};

void foo(struct data_st* data)
{
    void* ptr = malloc(0x10);
    data->tmp_store = ptr;
}

int main(int argc,char** argv) 
{
    struct data_st* data = (struct data_st*)malloc(sizeof(struct data_st));
    foo(data);
    free(data->tmp_store);
    free(data);
    return 0;
}

The code does not have any leak problem, but when I compiled it and checked it with saber, it reported a memory leak:

clang -O2 -fno-inline -S -emit-llvm -g leak.c
saber -leak leak.ll
# NeverFree : memory allocation at : ({ ln: 9  cl: 17  fl: leak.c })

I tried to disable optimization, but this time it does not report any bugs:

clang -S -emit-llvm -g leak.c
saber -leak leak.ll
# no bug

I inspected the value flow graphs of the two versions. The unoptimized version seems to be good:

param-noopt

But for the optimized version, the ActualOUTSVFGNode 30 should be connected with LoadVFGNode 12 but it does not:

param-opt

Adding more fields to the structure causes the same problem, but if I make data to be a local variable instead of heap the problem disappears:

#include <stdio.h>

struct data_st {
    void *tmp_store;
};

void foo(struct data_st* data)
{
    void* ptr = malloc(0x10);
    data->tmp_store = ptr;
}

int main(int argc,char** argv) 
{
    struct data_st data;
    foo(&data);
    free(data.tmp_store);
    return 0;
}

param.tar.gz

yuleisui commented 1 year ago

What are the differences between optimized and unoptimized llvm ir

seviezhou commented 1 year ago

One big difference is that optimized IR does not use alloca inside code. In optimized one, all operations regards to data will use or cast the value returned by malloc, but in unoptimized one, all operations will first load the value from an alloca that stores the data pointer.

Optimized IR is:

%struct.data_st = type { i8* }

define dso_local void @foo(%struct.data_st* nocapture %0) local_unnamed_addr #0 !dbg !13 {
  %2 = tail call dereferenceable_or_null(16) i8* @malloc(i64 16), !dbg !20
  %3 = getelementptr inbounds %struct.data_st, %struct.data_st* %0, i64 0, i32 0, !dbg !21
  store i8* %2, i8** %3, align 8, !dbg !22, !tbaa !23
  ret void, !dbg !28
}

define dso_local i32 @main(i32 %0, i8** nocapture readnone %1) local_unnamed_addr #2 !dbg !29 {
  %3 = tail call dereferenceable_or_null(8) i8* @malloc(i64 8), !dbg !41
  %4 = bitcast i8* %3 to %struct.data_st*, !dbg !42
  tail call void @foo(%struct.data_st* %4), !dbg !43
  %5 = bitcast i8* %3 to i8**, !dbg !44
  %6 = load i8*, i8** %5, align 8, !dbg !44, !tbaa !23
  %7 = tail call i32 (i8*, ...) bitcast (i32 (...)* @free to i32 (i8*, ...)*)(i8* %6) #5, !dbg !45
  %8 = tail call i32 (%struct.data_st*, ...) bitcast (i32 (...)* @free to i32 (%struct.data_st*, ...)*)(%struct.data_st* %4) #5, !dbg !46
  ret i32 0, !dbg !47
}

Unoptimized one:

%struct.data_st = type { i8* }

define dso_local void @foo(%struct.data_st* %0) #0 !dbg !13 {
  %2 = alloca %struct.data_st*, align 8
  %3 = alloca i8*, align 8
  store %struct.data_st* %0, %struct.data_st** %2, align 8
  %4 = call i8* @malloc(i64 16), !dbg !20
  store i8* %4, i8** %3, align 8, !dbg !19
  %5 = load i8*, i8** %3, align 8, !dbg !21
  %6 = load %struct.data_st*, %struct.data_st** %2, align 8, !dbg !22
  %7 = getelementptr inbounds %struct.data_st, %struct.data_st* %6, i32 0, i32 0, !dbg !23
  store i8* %5, i8** %7, align 8, !dbg !24
  ret void, !dbg !25
}

define dso_local i32 @main(i32 %0, i8** %1) #0 !dbg !26 {
  %3 = alloca i32, align 4
  %4 = alloca i32, align 4
  %5 = alloca i8**, align 8
  %6 = alloca %struct.data_st*, align 8
  store i32 0, i32* %3, align 4
  store i32 %0, i32* %4, align 4
  store i8** %1, i8*** %5, align 8
  %7 = call i8* @malloc(i64 8), !dbg !39
  %8 = bitcast i8* %7 to %struct.data_st*, !dbg !40
  store %struct.data_st* %8, %struct.data_st** %6, align 8, !dbg !38
  %9 = load %struct.data_st*, %struct.data_st** %6, align 8, !dbg !41
  call void @foo(%struct.data_st* %9), !dbg !42
  %10 = load %struct.data_st*, %struct.data_st** %6, align 8, !dbg !43
  %11 = getelementptr inbounds %struct.data_st, %struct.data_st* %10, i32 0, i32 0, !dbg !44
  %12 = load i8*, i8** %11, align 8, !dbg !44
  %13 = call i32 (i8*, ...) bitcast (i32 (...)* @free to i32 (i8*, ...)*)(i8* %12), !dbg !45
  %14 = load %struct.data_st*, %struct.data_st** %6, align 8, !dbg !46
  %15 = call i32 (%struct.data_st*, ...) bitcast (i32 (...)* @free to i32 (%struct.data_st*, ...)*)(%struct.data_st* %14), !dbg !47
  ret i32 0, !dbg !48
}

yuleisui commented 1 year ago

I had a look. It is seems that ‘%3 = tail call dereferenceable_or_null(8) i8* @malloc(i64 8), !dbg !41’ is not recognised as a memory allocation. Can you change this to a normal malloc in the IR to see whether it works or not

seviezhou commented 1 year ago

I changed it but the result is the same:

%struct.data_st = type { i8* }

define dso_local void @foo(%struct.data_st* nocapture %0) local_unnamed_addr #0 !dbg !13 {
  %2 = call i8* @malloc(i64 16), !dbg !20
  %3 = getelementptr inbounds %struct.data_st, %struct.data_st* %0, i64 0, i32 0, !dbg !21
  store i8* %2, i8** %3, align 8, !dbg !22, !tbaa !23
  ret void, !dbg !28
}

define dso_local i32 @main(i32 %0, i8** nocapture readnone %1) local_unnamed_addr #2 !dbg !29 {
  %3 = call i8* @malloc(i64 8), !dbg !41
  %4 = bitcast i8* %3 to %struct.data_st*, !dbg !42
  tail call void @foo(%struct.data_st* %4), !dbg !43
  %5 = bitcast i8* %3 to i8**, !dbg !44
  %6 = load i8*, i8** %5, align 8, !dbg !44, !tbaa !23
  %7 = tail call i32 (i8*, ...) bitcast (i32 (...)* @free to i32 (i8*, ...)*)(i8* %6) #5, !dbg !45
  %8 = tail call i32 (%struct.data_st*, ...) bitcast (i32 (...)* @free to i32 (%struct.data_st*, ...)*)(%struct.data_st* %4) #5, !dbg !46
  ret i32 0, !dbg !47
}

yuleisui commented 1 year ago

Can you send me the two complete bc files? The code piece is not analyzable.

seviezhou commented 1 year ago

Re-upload the attachment:

param.tar.gz

yuleisui commented 1 year ago

The optimisation introduced a weird transformation. ‘%5 = bitcast i8* %3 to i8**, !dbg !44’. This is not a proper transformation.

Rather than free(data->tmp_store), it actually free(*data). That is why saber says the heap allocation attached on the field is not released.

seviezhou commented 1 year ago

I think the bitcast exists because the tmp_store is the first field of struct data_st, so the %3 can refer to both data by i8* or refer to tmp_store by first casting to i8** and load from it. Such optimization is common in clang from my observation, structure typed pointer will be used as i8* and the fields are access by first gep from i8* and casting the geped value to the field value type.

I change the C code by adding two more fields to struct data_st:

#include <stdio.h>

struct data_st {
    int f1;
    int f2;
    void *tmp_store;
};

void foo(struct data_st* data)
{
    void* ptr = malloc(0x10);
    data->tmp_store = ptr;
}

int main(int argc,char** argv) 
{
    struct data_st data;
    foo(&data);
    free(data.tmp_store);
    return 0;
}

The optimized IR looks like:

; ModuleID = 'leak.c'
source_filename = "leak.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

%struct.data_st = type { i32, i32, i8* }

; Function Attrs: nofree noinline nounwind uwtable
define dso_local void @foo(%struct.data_st* nocapture %0) local_unnamed_addr #0 !dbg !16 {
  call void @llvm.dbg.value(metadata %struct.data_st* %0, metadata !20, metadata !DIExpression()), !dbg !22
  %2 = tail call dereferenceable_or_null(16) i8* @malloc(i64 16), !dbg !23
  call void @llvm.dbg.value(metadata i8* %2, metadata !21, metadata !DIExpression()), !dbg !22
  %3 = getelementptr inbounds %struct.data_st, %struct.data_st* %0, i64 0, i32 2, !dbg !24
  store i8* %2, i8** %3, align 8, !dbg !25, !tbaa !26
  ret void, !dbg !32
}

; Function Attrs: nofree nounwind
declare dso_local noalias i8* @malloc(i64) local_unnamed_addr #1

; Function Attrs: noinline nounwind uwtable
define dso_local i32 @main(i32 %0, i8** nocapture readnone %1) local_unnamed_addr #2 !dbg !33 {
  call void @llvm.dbg.value(metadata i32 %0, metadata !40, metadata !DIExpression()), !dbg !43
  call void @llvm.dbg.value(metadata i8** %1, metadata !41, metadata !DIExpression()), !dbg !43
  %3 = tail call dereferenceable_or_null(16) i8* @malloc(i64 16), !dbg !44
  %4 = bitcast i8* %3 to %struct.data_st*, !dbg !45
  call void @llvm.dbg.value(metadata %struct.data_st* %4, metadata !42, metadata !DIExpression()), !dbg !43
  tail call void @foo(%struct.data_st* %4), !dbg !46
  %5 = getelementptr inbounds i8, i8* %3, i64 8, !dbg !47
  %6 = bitcast i8* %5 to i8**, !dbg !47
  %7 = load i8*, i8** %6, align 8, !dbg !47, !tbaa !26
  %8 = tail call i32 (i8*, ...) bitcast (i32 (...)* @free to i32 (i8*, ...)*)(i8* %7) #5, !dbg !48
  %9 = tail call i32 (%struct.data_st*, ...) bitcast (i32 (...)* @free to i32 (%struct.data_st*, ...)*)(%struct.data_st* %4) #5, !dbg !49
  ret i32 0, !dbg !50
}

declare dso_local i32 @free(...) local_unnamed_addr #3

; Function Attrs: nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata, metadata, metadata) #4

attributes #0 = { nofree noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #4 = { nounwind readnone speculatable willreturn }
attributes #5 = { nounwind }

!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!12, !13, !14}
!llvm.ident = !{!15}

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 10.0.0-4ubuntu1 ", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !3, splitDebugInlining: false, nameTableKind: None)
!1 = !DIFile(filename: "leak.c", directory: "")
!2 = !{}
!3 = !{!4}
!4 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !5, size: 64)
!5 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "data_st", file: !1, line: 3, size: 128, elements: !6)
!6 = !{!7, !9, !10}
!7 = !DIDerivedType(tag: DW_TAG_member, name: "f1", scope: !5, file: !1, line: 5, baseType: !8, size: 32)
!8 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!9 = !DIDerivedType(tag: DW_TAG_member, name: "f2", scope: !5, file: !1, line: 6, baseType: !8, size: 32, offset: 32)
!10 = !DIDerivedType(tag: DW_TAG_member, name: "tmp_store", scope: !5, file: !1, line: 7, baseType: !11, size: 64, offset: 64)
!11 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null, size: 64)
!12 = !{i32 7, !"Dwarf Version", i32 4}
!13 = !{i32 2, !"Debug Info Version", i32 3}
!14 = !{i32 1, !"wchar_size", i32 4}
!15 = !{!"clang version 10.0.0-4ubuntu1 "}
!16 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 10, type: !17, scopeLine: 11, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !19)
!17 = !DISubroutineType(types: !18)
!18 = !{null, !4}
!19 = !{!20, !21}
!20 = !DILocalVariable(name: "data", arg: 1, scope: !16, file: !1, line: 10, type: !4)
!21 = !DILocalVariable(name: "ptr", scope: !16, file: !1, line: 12, type: !11)
!22 = !DILocation(line: 0, scope: !16)
!23 = !DILocation(line: 12, column: 17, scope: !16)
!24 = !DILocation(line: 13, column: 11, scope: !16)
!25 = !DILocation(line: 13, column: 21, scope: !16)
!26 = !{!27, !31, i64 8}
!27 = !{!"data_st", !28, i64 0, !28, i64 4, !31, i64 8}
!28 = !{!"int", !29, i64 0}
!29 = !{!"omnipotent char", !30, i64 0}
!30 = !{!"Simple C/C++ TBAA"}
!31 = !{!"any pointer", !29, i64 0}
!32 = !DILocation(line: 14, column: 1, scope: !16)
!33 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 16, type: !34, scopeLine: 17, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !39)
!34 = !DISubroutineType(types: !35)
!35 = !{!8, !8, !36}
!36 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !37, size: 64)
!37 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !38, size: 64)
!38 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
!39 = !{!40, !41, !42}
!40 = !DILocalVariable(name: "argc", arg: 1, scope: !33, file: !1, line: 16, type: !8)
!41 = !DILocalVariable(name: "argv", arg: 2, scope: !33, file: !1, line: 16, type: !36)
!42 = !DILocalVariable(name: "data", scope: !33, file: !1, line: 18, type: !4)
!43 = !DILocation(line: 0, scope: !33)
!44 = !DILocation(line: 18, column: 45, scope: !33)
!45 = !DILocation(line: 18, column: 28, scope: !33)
!46 = !DILocation(line: 19, column: 5, scope: !33)
!47 = !DILocation(line: 20, column: 16, scope: !33)
!48 = !DILocation(line: 20, column: 5, scope: !33)
!49 = !DILocation(line: 21, column: 5, scope: !33)
!50 = !DILocation(line: 22, column: 5, scope: !33)

The problem still exists:

svf1

yuleisui commented 1 year ago

The optimization though is valid for execution and code generation for C, but it is not handled by analysis rules (possibly unsafe if the struct layout is unknown or partially unknown). This makes static analysis hard to analyze them.

In your new case, the aggressive optimization changes free(data.tmp_store); from a safe field access to a pointer arithmetic.

Well-typed field access

%3 = getelementptr inbounds %struct.data_st, %struct.data_st* %0, i64 0, i32 2, !dbg !24

Pointer arithmetic via void*

%5 = getelementptr inbounds i8, i8* %3, i64 8, !dbg !47   
%6 = bitcast i8* %5 to i8**, !dbg !47
%7 = load i8*, i8** %6, align 8, !dbg !47, !tbaa !26

%5 = getelementptr inbounds i8, i8* %3, i64 8 is a pointer arithmetic (i.e., %8+8) that can not be resolved precisely by SVF. Hence, the load is not able to load the corresponding object.

seviezhou commented 1 year ago

Ok, so SVF only handles pointer arithmetic with well typed field access. Such optimizations are not present in old LLVM versions such as 3.6, but is common in newer versions, for example, LLVM 8 has such optimizations.

In the future, I think this can be handled in two ways. First, we can model the pointer memory as a byte array and only care about byte offsets. Second, we can add a pre-transformation pass to SVF to transform byte array pointer arithmetic back to structure-based pointer arithmetic.

yuleisui commented 1 year ago

The second option looks good to me. The first option is too costly as it will significantly increase the number of objects.

yuleisui commented 1 year ago

In principle, static analysis (on source code) should be performed without compiler optimization (in order to preserve as much information as possible).

seviezhou commented 1 year ago

I think in real-world cases, certain degree of optimizations should be considered. First, optimized IR can speed up static analysis, for example, mem2reg optimizations can reduce the number of indirect dataflow transfer. Second, many projects by default enable optimizations during compilation and the release (optimized) version of the software could potentially contain bugs that are not present in unoptimized version. But it seems that wllvm will output unoptimized code, I am not sure about this.

yuleisui commented 1 year ago

I think in real-world cases, certain degree of optimizations should be considered. First, optimized IR can speed up static analysis, for example, mem2reg optimizations can reduce the number of indirect dataflow transfer. Second, many projects by default enable optimizations during compilation and the release (optimized) version of the software could potentially contain bugs that are not present in unoptimized version. But it seems that wllvm will output unoptimized code, I am not sure about this.

mem2reg will not perform your weird optimization, which is used for layout optimization and fast instruction scheduling. In real-world source code static analysis, no optimization is the ideal setting. Why perform compiler optimization to discover bugs in the source code? This will further the question that does the source contain a bug or it is introduced by compiler optimization. I don't think analyzing optimized code makes sense for static analysis if the source code is available.

yuleisui commented 1 year ago

On the other hand, I think this is an open and interesting question (you are doing a good job!). I do believe compiler transformations (not only optimizations) could help discover problems either in the target code or analyzers.

seviezhou commented 1 year ago

mem2reg will not perform your weird optimization, which is used for layout optimization and fast instruction scheduling.

Yes, I mean running an optimization sequence like O2 could reduce the complexity of the code and makes infirect operations become direct.

In real-world source code static analysis, no optimization is the ideal setting. Why perform compiler optimization to discover bugs in the source code? This will further the question that does the source contain a bug or it is introduced by compiler optimization. I don't think analyzing optimized code makes sense for static analysis if the source code is available.

Ok, I understand your meaning. Static analysis needs sufficient information to work. Actually, from my experience, ffmpeg cannot be built without optimization, referring to its developer (ffmpeg ticket):

Requiring a certain level of optimizations (particularly Dead Code Elimination) is a known limitation of our code base. I'm sure there is various tickets already and a lot of discussions to be found.

I found many cases of byte array pointer arithmetic in IR built from ffmpeg. But anyway, assumption of unoptimized IR works in most cases, that's fine. Thanks for your comments.

SVF-tools / SVF

Broken Value Flow when Passing Pointer through Parameter #988