Closed yousifpatti closed 4 months ago
Hi Liam, can you have an initial run and let me know if the calls are resolved alright? I have tested the most common examples (IndirectCallsTests in the test folder) and they all pass @l-kent
I have put the loop issue in #213 so it can be investigated after the merge
Adding the memory regions into the IR doesn't appear to work correctly at present.
MemoryAssigns are not handled properly, with only the rhs being updated to use a new region - this should not happen (and #188 will prevent a MemoryAssign having a different Memory on the lhs and rhs).
Identifying regions seems like oversimplifies things at present. In the test basicassign_gamma0/clang
for instance, there is the following section in main:
000002c5: R8 := 0x11000
000002cc: R8 := pad:64[mem[R8 + 0x34, el]:u32]
000002d1: R9 := 0x11000
000002d9: mem := mem with [R9 + 0x38, el]:u32 <- 31:0[R8]
0x11034 is the location of 32-bit global variable 'secret'. 0x11038 is the location of 32-bit global variable 'x'.
The region analysis collapses these accesses into a single region named 'abort', seemingly named after the pointer located at at 0x11018 that points to the external abort
C library function. In such a simple case, where there is no ambiguity in the accesses, the analysis surely should be able to distinguish between these as regions?
It's also necessary to update the initialised memory sections when splitting mem into separate regions, and this doesn't happen.
What I would like to do is look at merging the commits up to 4632b95 now, and move the commits since then into a new branch since it's clear they're going to require much more work. Does this make sense as an approach?
evaluateExpressionWithSSA really needs to be implemented for all Exprs, it's quite limited at the moment
Whenever the analysis resolves an indirect call, it also seems to incorrectly 'resolve' the procedure's return statement as a call to whatever the indirect call that it has resolved is.
Test cases that still fail to resolve indirect calls:
As always, jumptable3 and switch2 are not possible to resolve correctly due to BAP's limitations so I haven't listed them. I have not yet checked that all others are resolved correctly.
I want to fix the bug where returns get incorrectly resolved as indirect calls, then I'm happy to merge up to 4632b95
Since all the commits here are also in the yousif-regions-into-ir
branch, I want to drop the last few commits from this branch that aren't ready to merge yet. Is that ok @yousifpatti?
The bug with returns being resolved incorrectly as indirect calls can be easily fixed by just not trying to resolve calls to R30, but it indicates a broader problem with the points-to analysis over-approximating stack accesses that it should be able to differentiate between.
The following is from the indirect_call/clang test. I've annotated it with the values for relevant variables given by the points-to analysis result at relevant points.
00000352:
00000356: R31 := R31 - 0x30
R31 == (stack_24(R31 + 40), stack_13(R31 + 16))
0000035c: #5 := R31 + 0x20
#5 == (stack_24(R31 + 40), stack_13(R31 + 16))
00000362: mem := mem with [#5, el]:u64 <- R29
00000368: mem := mem with [#5 + 8, el]:u64 <- R30
0000036e: R29 := R31 + 0x20
R29 == (stack_24(R31 + 40), stack_13(R31 + 16))
00000373: R8 := 0
0000037b: mem := mem with [R31 + 0xC, el]:u32 <- 31:0[R8]
00000382: mem := mem with [R29 - 4, el]:u32 <- 0
00000387: R8 := 0
0000038d: R8 := R8 + 0x754
00000395: mem := mem with [R31 + 0x10, el]:u64 <- R8
0000039a: R0 := 0
000003a0: R0 := R0 + 0x7DF
000003a5: R30 := 0x7A0
000003a7: call @printf with return %000003a9
000003a9:
000003ae: R8 := mem[R31 + 0x10, el]:u64
R8 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003b3: R30 := 0x7A8
000003b6: call R8 with return %000003b8
000003b8:
000003bd: R0 := pad:64[mem[R31 + 0xC, el]:u32]
R0 == stack_6(R31 + 12)
000003c3: #6 := R31 + 0x20
#6 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003c8: R29 := mem[#6, el]:u64
R29 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003cd: R30 := mem[#6 + 8, el]:u64
R30 == (stack_24(R31 + 40), stack_13(R31 + 16))
000003d3: R31 := R31 + 0x30
000003d8: call R30 with noreturn
According to the analysis, stack_13 contains a pointer to 'greet', stack_24's contents is empty/unknown, and stack_6 contains the value 0bv32.
The analysis can't distinguish between accesses to stack_13 and stack_24, even though there shouldn't be any ambiguity. The analysis only resolves the indirect call to greet correctly at 000003b6 because it doesn't know what stack_24 points to and ignores it as a result.
I also expect that the region contents not taking the assignment point into account could pose problems in the future. Memory values can be reassigned, so a program that writes different values to the same location at multiple points would likely significantly break the analysis.
Currently passing all cases mentioned by Liam in #196 and #43 as well