Open seviezhou opened 1 year ago
What are the differences between optimized and unoptimized llvm ir
One big difference is that optimized IR does not use alloca
inside code. In optimized one, all operations regards to data
will use or cast the value returned by malloc
, but in unoptimized one, all operations will first load the value from an alloca
that stores the data
pointer.
Optimized IR is:
%struct.data_st = type { i8* }
define dso_local void @foo(%struct.data_st* nocapture %0) local_unnamed_addr #0 !dbg !13 {
%2 = tail call dereferenceable_or_null(16) i8* @malloc(i64 16), !dbg !20
%3 = getelementptr inbounds %struct.data_st, %struct.data_st* %0, i64 0, i32 0, !dbg !21
store i8* %2, i8** %3, align 8, !dbg !22, !tbaa !23
ret void, !dbg !28
}
define dso_local i32 @main(i32 %0, i8** nocapture readnone %1) local_unnamed_addr #2 !dbg !29 {
%3 = tail call dereferenceable_or_null(8) i8* @malloc(i64 8), !dbg !41
%4 = bitcast i8* %3 to %struct.data_st*, !dbg !42
tail call void @foo(%struct.data_st* %4), !dbg !43
%5 = bitcast i8* %3 to i8**, !dbg !44
%6 = load i8*, i8** %5, align 8, !dbg !44, !tbaa !23
%7 = tail call i32 (i8*, ...) bitcast (i32 (...)* @free to i32 (i8*, ...)*)(i8* %6) #5, !dbg !45
%8 = tail call i32 (%struct.data_st*, ...) bitcast (i32 (...)* @free to i32 (%struct.data_st*, ...)*)(%struct.data_st* %4) #5, !dbg !46
ret i32 0, !dbg !47
}
Unoptimized one:
%struct.data_st = type { i8* }
define dso_local void @foo(%struct.data_st* %0) #0 !dbg !13 {
%2 = alloca %struct.data_st*, align 8
%3 = alloca i8*, align 8
store %struct.data_st* %0, %struct.data_st** %2, align 8
%4 = call i8* @malloc(i64 16), !dbg !20
store i8* %4, i8** %3, align 8, !dbg !19
%5 = load i8*, i8** %3, align 8, !dbg !21
%6 = load %struct.data_st*, %struct.data_st** %2, align 8, !dbg !22
%7 = getelementptr inbounds %struct.data_st, %struct.data_st* %6, i32 0, i32 0, !dbg !23
store i8* %5, i8** %7, align 8, !dbg !24
ret void, !dbg !25
}
define dso_local i32 @main(i32 %0, i8** %1) #0 !dbg !26 {
%3 = alloca i32, align 4
%4 = alloca i32, align 4
%5 = alloca i8**, align 8
%6 = alloca %struct.data_st*, align 8
store i32 0, i32* %3, align 4
store i32 %0, i32* %4, align 4
store i8** %1, i8*** %5, align 8
%7 = call i8* @malloc(i64 8), !dbg !39
%8 = bitcast i8* %7 to %struct.data_st*, !dbg !40
store %struct.data_st* %8, %struct.data_st** %6, align 8, !dbg !38
%9 = load %struct.data_st*, %struct.data_st** %6, align 8, !dbg !41
call void @foo(%struct.data_st* %9), !dbg !42
%10 = load %struct.data_st*, %struct.data_st** %6, align 8, !dbg !43
%11 = getelementptr inbounds %struct.data_st, %struct.data_st* %10, i32 0, i32 0, !dbg !44
%12 = load i8*, i8** %11, align 8, !dbg !44
%13 = call i32 (i8*, ...) bitcast (i32 (...)* @free to i32 (i8*, ...)*)(i8* %12), !dbg !45
%14 = load %struct.data_st*, %struct.data_st** %6, align 8, !dbg !46
%15 = call i32 (%struct.data_st*, ...) bitcast (i32 (...)* @free to i32 (%struct.data_st*, ...)*)(%struct.data_st* %14), !dbg !47
ret i32 0, !dbg !48
}
I had a look. It is seems that ‘%3 = tail call dereferenceable_or_null(8) i8* @malloc(i64 8), !dbg !41’ is not recognised as a memory allocation. Can you change this to a normal malloc in the IR to see whether it works or not
I changed it but the result is the same:
%struct.data_st = type { i8* }
define dso_local void @foo(%struct.data_st* nocapture %0) local_unnamed_addr #0 !dbg !13 {
%2 = call i8* @malloc(i64 16), !dbg !20
%3 = getelementptr inbounds %struct.data_st, %struct.data_st* %0, i64 0, i32 0, !dbg !21
store i8* %2, i8** %3, align 8, !dbg !22, !tbaa !23
ret void, !dbg !28
}
define dso_local i32 @main(i32 %0, i8** nocapture readnone %1) local_unnamed_addr #2 !dbg !29 {
%3 = call i8* @malloc(i64 8), !dbg !41
%4 = bitcast i8* %3 to %struct.data_st*, !dbg !42
tail call void @foo(%struct.data_st* %4), !dbg !43
%5 = bitcast i8* %3 to i8**, !dbg !44
%6 = load i8*, i8** %5, align 8, !dbg !44, !tbaa !23
%7 = tail call i32 (i8*, ...) bitcast (i32 (...)* @free to i32 (i8*, ...)*)(i8* %6) #5, !dbg !45
%8 = tail call i32 (%struct.data_st*, ...) bitcast (i32 (...)* @free to i32 (%struct.data_st*, ...)*)(%struct.data_st* %4) #5, !dbg !46
ret i32 0, !dbg !47
}
Can you send me the two complete bc files? The code piece is not analyzable.
Re-upload the attachment:
The optimisation introduced a weird transformation. ‘%5 = bitcast i8* %3 to i8**, !dbg !44’. This is not a proper transformation.
Rather than free(data->tmp_store), it actually free(*data). That is why saber says the heap allocation attached on the field is not released.
I think the bitcast exists because the tmp_store
is the first field of struct data_st
, so the %3
can refer to both data
by i8*
or refer to tmp_store
by first casting to i8**
and load from it. Such optimization is common in clang from my observation, structure typed pointer will be used as i8*
and the fields are access by first gep
from i8*
and casting the geped
value to the field value type.
I change the C code by adding two more fields to struct data_st
:
#include <stdio.h>
struct data_st {
int f1;
int f2;
void *tmp_store;
};
void foo(struct data_st* data)
{
void* ptr = malloc(0x10);
data->tmp_store = ptr;
}
int main(int argc,char** argv)
{
struct data_st data;
foo(&data);
free(data.tmp_store);
return 0;
}
The optimized IR looks like:
; ModuleID = 'leak.c'
source_filename = "leak.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
%struct.data_st = type { i32, i32, i8* }
; Function Attrs: nofree noinline nounwind uwtable
define dso_local void @foo(%struct.data_st* nocapture %0) local_unnamed_addr #0 !dbg !16 {
call void @llvm.dbg.value(metadata %struct.data_st* %0, metadata !20, metadata !DIExpression()), !dbg !22
%2 = tail call dereferenceable_or_null(16) i8* @malloc(i64 16), !dbg !23
call void @llvm.dbg.value(metadata i8* %2, metadata !21, metadata !DIExpression()), !dbg !22
%3 = getelementptr inbounds %struct.data_st, %struct.data_st* %0, i64 0, i32 2, !dbg !24
store i8* %2, i8** %3, align 8, !dbg !25, !tbaa !26
ret void, !dbg !32
}
; Function Attrs: nofree nounwind
declare dso_local noalias i8* @malloc(i64) local_unnamed_addr #1
; Function Attrs: noinline nounwind uwtable
define dso_local i32 @main(i32 %0, i8** nocapture readnone %1) local_unnamed_addr #2 !dbg !33 {
call void @llvm.dbg.value(metadata i32 %0, metadata !40, metadata !DIExpression()), !dbg !43
call void @llvm.dbg.value(metadata i8** %1, metadata !41, metadata !DIExpression()), !dbg !43
%3 = tail call dereferenceable_or_null(16) i8* @malloc(i64 16), !dbg !44
%4 = bitcast i8* %3 to %struct.data_st*, !dbg !45
call void @llvm.dbg.value(metadata %struct.data_st* %4, metadata !42, metadata !DIExpression()), !dbg !43
tail call void @foo(%struct.data_st* %4), !dbg !46
%5 = getelementptr inbounds i8, i8* %3, i64 8, !dbg !47
%6 = bitcast i8* %5 to i8**, !dbg !47
%7 = load i8*, i8** %6, align 8, !dbg !47, !tbaa !26
%8 = tail call i32 (i8*, ...) bitcast (i32 (...)* @free to i32 (i8*, ...)*)(i8* %7) #5, !dbg !48
%9 = tail call i32 (%struct.data_st*, ...) bitcast (i32 (...)* @free to i32 (%struct.data_st*, ...)*)(%struct.data_st* %4) #5, !dbg !49
ret i32 0, !dbg !50
}
declare dso_local i32 @free(...) local_unnamed_addr #3
; Function Attrs: nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata, metadata, metadata) #4
attributes #0 = { nofree noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nofree nounwind "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { noinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #4 = { nounwind readnone speculatable willreturn }
attributes #5 = { nounwind }
!llvm.dbg.cu = !{!0}
!llvm.module.flags = !{!12, !13, !14}
!llvm.ident = !{!15}
!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 10.0.0-4ubuntu1 ", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, retainedTypes: !3, splitDebugInlining: false, nameTableKind: None)
!1 = !DIFile(filename: "leak.c", directory: "")
!2 = !{}
!3 = !{!4}
!4 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !5, size: 64)
!5 = distinct !DICompositeType(tag: DW_TAG_structure_type, name: "data_st", file: !1, line: 3, size: 128, elements: !6)
!6 = !{!7, !9, !10}
!7 = !DIDerivedType(tag: DW_TAG_member, name: "f1", scope: !5, file: !1, line: 5, baseType: !8, size: 32)
!8 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
!9 = !DIDerivedType(tag: DW_TAG_member, name: "f2", scope: !5, file: !1, line: 6, baseType: !8, size: 32, offset: 32)
!10 = !DIDerivedType(tag: DW_TAG_member, name: "tmp_store", scope: !5, file: !1, line: 7, baseType: !11, size: 64, offset: 64)
!11 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null, size: 64)
!12 = !{i32 7, !"Dwarf Version", i32 4}
!13 = !{i32 2, !"Debug Info Version", i32 3}
!14 = !{i32 1, !"wchar_size", i32 4}
!15 = !{!"clang version 10.0.0-4ubuntu1 "}
!16 = distinct !DISubprogram(name: "foo", scope: !1, file: !1, line: 10, type: !17, scopeLine: 11, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !19)
!17 = !DISubroutineType(types: !18)
!18 = !{null, !4}
!19 = !{!20, !21}
!20 = !DILocalVariable(name: "data", arg: 1, scope: !16, file: !1, line: 10, type: !4)
!21 = !DILocalVariable(name: "ptr", scope: !16, file: !1, line: 12, type: !11)
!22 = !DILocation(line: 0, scope: !16)
!23 = !DILocation(line: 12, column: 17, scope: !16)
!24 = !DILocation(line: 13, column: 11, scope: !16)
!25 = !DILocation(line: 13, column: 21, scope: !16)
!26 = !{!27, !31, i64 8}
!27 = !{!"data_st", !28, i64 0, !28, i64 4, !31, i64 8}
!28 = !{!"int", !29, i64 0}
!29 = !{!"omnipotent char", !30, i64 0}
!30 = !{!"Simple C/C++ TBAA"}
!31 = !{!"any pointer", !29, i64 0}
!32 = !DILocation(line: 14, column: 1, scope: !16)
!33 = distinct !DISubprogram(name: "main", scope: !1, file: !1, line: 16, type: !34, scopeLine: 17, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !39)
!34 = !DISubroutineType(types: !35)
!35 = !{!8, !8, !36}
!36 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !37, size: 64)
!37 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !38, size: 64)
!38 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
!39 = !{!40, !41, !42}
!40 = !DILocalVariable(name: "argc", arg: 1, scope: !33, file: !1, line: 16, type: !8)
!41 = !DILocalVariable(name: "argv", arg: 2, scope: !33, file: !1, line: 16, type: !36)
!42 = !DILocalVariable(name: "data", scope: !33, file: !1, line: 18, type: !4)
!43 = !DILocation(line: 0, scope: !33)
!44 = !DILocation(line: 18, column: 45, scope: !33)
!45 = !DILocation(line: 18, column: 28, scope: !33)
!46 = !DILocation(line: 19, column: 5, scope: !33)
!47 = !DILocation(line: 20, column: 16, scope: !33)
!48 = !DILocation(line: 20, column: 5, scope: !33)
!49 = !DILocation(line: 21, column: 5, scope: !33)
!50 = !DILocation(line: 22, column: 5, scope: !33)
The problem still exists:
The optimization though is valid for execution and code generation for C, but it is not handled by analysis rules (possibly unsafe if the struct layout is unknown or partially unknown). This makes static analysis hard to analyze them.
In your new case, the aggressive optimization changes free(data.tmp_store);
from a safe field access to a pointer arithmetic.
Well-typed field access
%3 = getelementptr inbounds %struct.data_st, %struct.data_st* %0, i64 0, i32 2, !dbg !24
Pointer arithmetic via void*
%5 = getelementptr inbounds i8, i8* %3, i64 8, !dbg !47
%6 = bitcast i8* %5 to i8**, !dbg !47
%7 = load i8*, i8** %6, align 8, !dbg !47, !tbaa !26
%5 = getelementptr inbounds i8, i8* %3, i64 8
is a pointer arithmetic (i.e., %8+8) that can not be resolved precisely by SVF. Hence, the load is not able to load the corresponding object.
Ok, so SVF only handles pointer arithmetic with well typed field access. Such optimizations are not present in old LLVM versions such as 3.6
, but is common in newer versions, for example, LLVM 8
has such optimizations.
In the future, I think this can be handled in two ways. First, we can model the pointer memory as a byte array and only care about byte offsets. Second, we can add a pre-transformation pass to SVF to transform byte array pointer arithmetic back to structure-based pointer arithmetic.
The second option looks good to me. The first option is too costly as it will significantly increase the number of objects.
In principle, static analysis (on source code) should be performed without compiler optimization (in order to preserve as much information as possible).
I think in real-world cases, certain degree of optimizations should be considered. First, optimized IR can speed up static analysis, for example, mem2reg
optimizations can reduce the number of indirect dataflow transfer. Second, many projects by default enable optimizations during compilation and the release (optimized) version of the software could potentially contain bugs that are not present in unoptimized version. But it seems that wllvm
will output unoptimized code, I am not sure about this.
I think in real-world cases, certain degree of optimizations should be considered. First, optimized IR can speed up static analysis, for example,
mem2reg
optimizations can reduce the number of indirect dataflow transfer. Second, many projects by default enable optimizations during compilation and the release (optimized) version of the software could potentially contain bugs that are not present in unoptimized version. But it seems thatwllvm
will output unoptimized code, I am not sure about this.
mem2reg
will not perform your weird optimization, which is used for layout optimization and fast instruction scheduling.
In real-world source code static analysis, no optimization is the ideal setting. Why perform compiler optimization to discover bugs in the source code? This will further the question that does the source contain a bug or it is introduced by compiler optimization. I don't think analyzing optimized code makes sense for static analysis if the source code is available.
On the other hand, I think this is an open and interesting question (you are doing a good job!). I do believe compiler transformations (not only optimizations) could help discover problems either in the target code or analyzers.
mem2reg will not perform your weird optimization, which is used for layout optimization and fast instruction scheduling.
Yes, I mean running an optimization sequence like O2
could reduce the complexity of the code and makes infirect operations become direct.
In real-world source code static analysis, no optimization is the ideal setting. Why perform compiler optimization to discover bugs in the source code? This will further the question that does the source contain a bug or it is introduced by compiler optimization. I don't think analyzing optimized code makes sense for static analysis if the source code is available.
Ok, I understand your meaning. Static analysis needs sufficient information to work. Actually, from my experience, ffmpeg
cannot be built without optimization, referring to its developer (ffmpeg ticket):
Requiring a certain level of optimizations (particularly Dead Code Elimination) is a known limitation of our code base. I'm sure there is various tickets already and a lot of discussions to be found.
I found many cases of byte array pointer arithmetic in IR built from ffmpeg
. But anyway, assumption of unoptimized IR works in most cases, that's fine. Thanks for your comments.
There is a problem in value-flow construction. I tried to analyze the following code with the leak checker:
The code does not have any leak problem, but when I compiled it and checked it with saber, it reported a memory leak:
I tried to disable optimization, but this time it does not report any bugs:
I inspected the value flow graphs of the two versions. The unoptimized version seems to be good:
But for the optimized version, the
ActualOUTSVFGNode 30
should be connected withLoadVFGNode 12
but it does not:Adding more fields to the structure causes the same problem, but if I make
data
to be a local variable instead of heap the problem disappears:param.tar.gz