Improved call resolution, testing and convert analyses to use IR iterator

yousifpatti commented 6 months ago

Currently passing all cases mentioned by Liam in #196 and #43 as well

[x] Fix up some to dos
[x] Code cleanup

yousifpatti commented 6 months ago

Hi Liam, can you have an initial run and let me know if the calls are resolved alright? I have tested the most common examples (IndirectCallsTests in the test folder) and they all pass @l-kent

yousifpatti commented 4 months ago

I have put the loop issue in #213 so it can be investigated after the merge

l-kent commented 4 months ago

Adding the memory regions into the IR doesn't appear to work correctly at present.

MemoryAssigns are not handled properly, with only the rhs being updated to use a new region - this should not happen (and #188 will prevent a MemoryAssign having a different Memory on the lhs and rhs).

Identifying regions seems like oversimplifies things at present. In the test basicassign_gamma0/clang for instance, there is the following section in main:

000002c5: R8 := 0x11000
000002cc: R8 := pad:64[mem[R8 + 0x34, el]:u32]
000002d1: R9 := 0x11000
000002d9: mem := mem with [R9 + 0x38, el]:u32 <- 31:0[R8]

0x11034 is the location of 32-bit global variable 'secret'. 0x11038 is the location of 32-bit global variable 'x'.

The region analysis collapses these accesses into a single region named 'abort', seemingly named after the pointer located at at 0x11018 that points to the external abort C library function. In such a simple case, where there is no ambiguity in the accesses, the analysis surely should be able to distinguish between these as regions?

l-kent commented 4 months ago

It's also necessary to update the initialised memory sections when splitting mem into separate regions, and this doesn't happen.

l-kent commented 4 months ago

What I would like to do is look at merging the commits up to 4632b95 now, and move the commits since then into a new branch since it's clear they're going to require much more work. Does this make sense as an approach?

l-kent commented 4 months ago

evaluateExpressionWithSSA really needs to be implemented for all Exprs, it's quite limited at the moment

l-kent commented 4 months ago

Whenever the analysis resolves an indirect call, it also seems to incorrectly 'resolve' the procedure's return statement as a call to whatever the indirect call that it has resolved is.

l-kent commented 4 months ago

Test cases that still fail to resolve indirect calls:

functionpointer/clang_O2
functionpointer/gcc_O2
jumptable/clang
jumptable/clang_no_plt_no_pic
jumptable/clang_pic
jumptable2/clang
jumptable2/clang_no_plt_no_pic
jumptable2/clang_pic
jumptable2/gcc_O2

As always, jumptable3 and switch2 are not possible to resolve correctly due to BAP's limitations so I haven't listed them. I have not yet checked that all others are resolved correctly.

l-kent commented 4 months ago

I want to fix the bug where returns get incorrectly resolved as indirect calls, then I'm happy to merge up to 4632b95

l-kent commented 4 months ago

Since all the commits here are also in the yousif-regions-into-ir branch, I want to drop the last few commits from this branch that aren't ready to merge yet. Is that ok @yousifpatti?

l-kent commented 4 months ago

The bug with returns being resolved incorrectly as indirect calls can be easily fixed by just not trying to resolve calls to R30, but it indicates a broader problem with the points-to analysis over-approximating stack accesses that it should be able to differentiate between.

The following is from the indirect_call/clang test. I've annotated it with the values for relevant variables given by the points-to analysis result at relevant points.

00000352:
00000356: R31 := R31 - 0x30
R31 == (stack_24(R31 + 40), stack_13(R31 + 16))
0000035c: #5 := R31 + 0x20
#5 == (stack_24(R31 + 40), stack_13(R31 + 16))
00000362: mem := mem with [#5, el]:u64 <- R29
00000368: mem := mem with [#5 + 8, el]:u64 <- R30
0000036e: R29 := R31 + 0x20
R29 == (stack_24(R31 + 40), stack_13(R31 + 16))
00000373: R8 := 0
0000037b: mem := mem with [R31 + 0xC, el]:u32 <- 31:0[R8]
00000382: mem := mem with [R29 - 4, el]:u32 <- 0
00000387: R8 := 0
0000038d: R8 := R8 + 0x754
00000395: mem := mem with [R31 + 0x10, el]:u64 <- R8
0000039a: R0 := 0
000003a0: R0 := R0 + 0x7DF
000003a5: R30 := 0x7A0
000003a7: call @printf with return %000003a9

000003a9:
000003ae: R8 := mem[R31 + 0x10, el]:u64
R8 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003b3: R30 := 0x7A8
000003b6: call R8 with return %000003b8

000003b8:
000003bd: R0 := pad:64[mem[R31 + 0xC, el]:u32]
R0 == stack_6(R31 + 12)
000003c3: #6 := R31 + 0x20
#6 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003c8: R29 := mem[#6, el]:u64
R29 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003cd: R30 := mem[#6 + 8, el]:u64
R30 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003d3: R31 := R31 + 0x30
000003d8: call R30 with noreturn

According to the analysis, stack_13 contains a pointer to 'greet', stack_24's contents is empty/unknown, and stack_6 contains the value 0bv32.

The analysis can't distinguish between accesses to stack_13 and stack_24, even though there shouldn't be any ambiguity. The analysis only resolves the indirect call to greet correctly at 000003b6 because it doesn't know what stack_24 points to and ignores it as a result.

I also expect that the region contents not taking the assignment point into account could pose problems in the future. Memory values can be reassigned, so a program that writes different values to the same location at multiple points would likely significantly break the analysis.

l-kent commented 4 months ago

My work is in https://github.com/UQ-PAC/bil-to-boogie-translator/tree/yousif-ssa-steensgaard2-merge

UQ-PAC / BASIL

Improved call resolution, testing and convert analyses to use IR iterator #204