Adding Memory Regions to IR

UQ-PAC / BASIL

Apache License 2.0

8 stars 0 forks source link

Adding Memory Regions to IR #231

Open yousifpatti opened 1 month ago

yousifpatti commented 1 month ago

Before merging there are two main issues

[x] Some initial regions of a function are ignored (they are not collected by MRA) and thus are not renamed. This may have been fixed in the MRA file of the other branch but has other dependencies that needs merging.
[x] Some Data section regions that are in initial memory and read only memory are not found. (Those are not loaded by the ReadELFLoader and cause this issue as MMM relies on results from the loader. Notable mentions for "arrays" example: [WARN] No region found for memory section 0 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 0 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 0 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 0 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 568 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 596 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 632 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 664 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 696 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 960 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 1148 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 1176 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 1240 [transformMemorySections@RegionInjector.scala:297] [WARN] No region found for memory section 1528 [transformMemorySections@RegionInjector.scala:297]

Anything bigger than these offsets is resolved. These offsets dont seem to be loaded with externalFunctions or globalOffsets

l-kent commented 3 weeks ago

Memory sections in the initial memory/read only memory that are not actually accessed in the program don't need to be given their own region, it's fine if they just use the default mem region for now. Eventually we want to go through the initial memory definitions per-procedure so that any memory definitions that relate to regions that aren't in the procedure are not included.

l-kent commented 3 weeks ago

This may have been fixed in the MRA file of the other branch but has other dependencies that needs merging.

Which other branch are you talking about?

yousifpatti commented 3 weeks ago

This may have been fixed in the MRA file of the other branch but has other dependencies that needs merging.

Which other branch are you talking about?

I am talking about yousif-vsa branch where I made changes to MRA. This issue has been resolved though in my latest commit here so no need to worry about that anymore.

yousifpatti commented 3 weeks ago

@l-kent In that case can we look into merging the changes please?

l-kent commented 3 weeks ago

Yes, I'm going to review this next.

yousifpatti commented 3 weeks ago

The last commit should fix the issue with over approximating to "abort" The issue was that abort was the last region loaded and thus any offsets after were approximated to "abort". The other global offsets were missing "globalAddresses" variable not passed to MMM, so I have just added those offsets and it is working properly.

l-kent commented 3 weeks ago

Currently, the initial memory pre/post-conditions are not generated with the identified regions. An example of this is the test correct/basic_function_call_reader/clang_pic. The following snippets are from the output currently produced for this test:

procedure main();
  ...
  free requires (memory_load64_le(mem, 69568bv64) == 69688bv64);

R8, Gamma_R8 := 65536bv64, true;
call rely();
R8, Gamma_R8 := memory_load64_le(x, bvadd64(R8, 4032bv64)), (gamma_load64(Gamma_x, bvadd64(R8, 4032bv64)) || L(x, bvadd64(R8, 4032bv64)));

In the second snippet, the address 69568 (65536 + 4032) is loaded from the region x. However, in the precondition created from initialised memory in the first snippet, the default region mem is used for the address 69568. We want the regions to be aligned, so that the same region is used for both.

l-kent commented 3 weeks ago

The rely() procedures created from the user's specification do not include the new regions. An example of this is the test correct/basicassign_gamma0/clang. The following snippets are all from the output produced for it:

axiom ($secret_addr == 69684bv64);

procedure {:extern} rely();
  modifies Gamma_mem, mem;
  ensures (forall i: bv64 :: (((mem[i] == old(mem[i])) ==> (Gamma_mem[i] == old(Gamma_mem[i])))));
  ensures (memory_load32_le(mem, $secret_addr) == old(memory_load32_le(mem, $secret_addr)));

R8, Gamma_R8 := 69632bv64, true;
call rely();
R8, Gamma_R8 := zero_extend32_32(memory_load32_le(secret, bvadd64(R8, 52bv64))), (gamma_load32(Gamma_secret, bvadd64(R8, 52bv64)) || L(secret, bvadd64(R8, 52bv64)));

In the third snippet, the address 69684 (69632 + 52) is loaded from the region secret, but in the rely() procedure, the address 69684 is loaded from the default region mem. We want this to be replaced in the rely procedure so it matches the identified region.

l-kent commented 3 weeks ago

The test correct/initialisation/gcc demonstrates three issues - not using identified regions in the generated require/ensures clauses from the user specifications, incorrectly removing relevant sections of initialised memory, and not using the identified regions in initialised memory requires/ensures clauses. I've already discussed the latter so I'll detail the first two for this test.

On main, the following snippet is produced for this test:

procedure main();
  modifies ...
  free requires (memory_load64_le(mem, 69632bv64) == 0bv64);
  free requires (memory_load64_le(mem, 69640bv64) == 69640bv64);
  free requires (memory_load64_le(mem, 69648bv64) == 416611827717bv64);
  free requires (memory_load64_le(mem, 69656bv64) == 68719476735bv64);
  free requires (memory_load64_le(mem, 69664bv64) == 8589934593bv64);

These requires clauses are all removed from the output when using this branch's memory region analysis. However, the addresses 69648, 69656, 69664, 69668 and 69652 are all accessed, so we want to keep the initialisations for 69648, 69656 and 69664 around. We want the regions in the requires clause to match as well, so that would require splitting up the initialisations further - the region y would need the address 69652 initialised with a 32-bit access, and the region x would need the address 69648 initialised with a 32-bit access, instead of the 64-bit initialisation covering both as-is.

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 16bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := zero_extend32_32(memory_load32_le(x, R0)), (gamma_load32(Gamma_x, R0) || L(x, R0));

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 24bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := memory_load64_le(z, R0), (gamma_load64(Gamma_z, R0) || L(z, R0));

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 32bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := zero_extend32_32(memory_load32_le(a, R0)), (gamma_load32(Gamma_a, R0) || L(a, R0));

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 32bv64), Gamma_R0;
call rely();
assert (L(a, bvadd64(R0, 4bv64)) ==> Gamma_R1);
a, Gamma_a := memory_store32_le(a, bvadd64(R0, 4bv64), R1[32:0]), gamma_store32(Gamma_a, bvadd64(R0, 4bv64), Gamma_R1);

R0, Gamma_R0 := 69632bv64, true;
R0, Gamma_R0 := bvadd64(R0, 20bv64), Gamma_R0;
call rely();
R0, Gamma_R0 := zero_extend56_8(memory_load8_le(y, R0)), (gamma_load8(Gamma_y, R0) || L(y, R0));

The output also includes the following ensures clauses in the procedure main():

ensures (memory_load32_le(mem, $x_addr) == 6bv32);
ensures (memory_load32_le(mem, bvadd64($a_addr, 4bv64)) == 4bv32);
ensures (memory_load32_le(mem, bvadd64($a_addr, 0bv64)) == 1bv32);

The issue is that mem is not replaced with identified regions for these, but we want it to.

l-kent commented 3 weeks ago

Currently, the memory regions for pointers and pointers-to-pointers are unnecessarily combined. This doesn't cause any issues at present but it is unnecessary.

An example of this is the output for the test case correct/basic_arrays_read/clang_pic:

R8, Gamma_R8 := 65536bv64, true;
call rely();
R8, Gamma_R8 := memory_load64_le(arr, bvadd64(R8, 4056bv64)), (gamma_load64(Gamma_arr, bvadd64(R8, 4056bv64)) || L(arr, bvadd64(R8, 4056bv64)));
call rely();
assert (L(arr, R8) ==> true);
arr, Gamma_arr := memory_store32_le(arr, R8, 0bv32), gamma_store32(Gamma_arr, R8, true);

The important context here is that the global variable arr is located at address 69638, and this address is stored in memory at address 69592. This means that the first memory access loads 69638 (the address of global variable arr) from address 69592 (65536 + 4056) to R8, and the second memory access writes 0 to the global variable arr.

Both accesses are given the region arr even though they are completely distinct memory locations, and it would be ideal for the analysis to be able to give these separate regions.

l-kent commented 3 weeks ago

This PR causes some of the IndirectCallsTests to fail - for some of those tests, the indirect calls get incorrectly resolved to the global variable x or _ITM_registerTMCloneTable.

Indirect call resolution also fails for the correct/jumptable/clang (including variants) tests and for the correct/functionpointer/gcc_O2 and /clang_O2 tests.

The correct/jumptable/clang test is a particularly interesting example, and stack region identification is currently incorrect for it too. These are all the stack accesses produced in main() (in order):

#5, Gamma_#5 := bvadd64(R31, 64bv64), Gamma_R31;
// R31 + 64 (64-bit access)
stack_25, Gamma_stack_25 := memory_store64_le(stack_25, #5, R29), gamma_store64(Gamma_stack_25, #5, Gamma_R29); 

// R31 + 72 (64-bit access)
stack_25, Gamma_stack_25 := memory_store64_le(stack_25, bvadd64(#5, 8bv64), R30), gamma_store64(Gamma_stack_25, bvadd64(#5, 8bv64), Gamma_R30);

R29, Gamma_R29 := bvadd64(R31, 64bv64), Gamma_R31;

// R31 + 12 (32-bit access)
stack_15, Gamma_stack_15 := memory_store32_le(stack_15, bvadd64(R31, 12bv64), R8[32:0]), gamma_store32(Gamma_stack_15, bvadd64(R31, 12bv64), Gamma_R8);

// R31 + 60 (32-bit access)
stack_28, Gamma_stack_28 := memory_store32_le(stack_28, bvadd64(R29, 18446744073709551612bv64), 0bv32), gamma_store32(Gamma_stack_28, bvadd64(R29, 18446744073709551612bv64), true); 

 // R31 + 56 (32-bit access)
stack_28, Gamma_stack_28 := memory_store32_le(stack_28, bvadd64(R29, 18446744073709551608bv64), R0[32:0]), gamma_store32(Gamma_stack_28, bvadd64(R29, 18446744073709551608bv64), Gamma_R0);

// R31 + 48 (64-bit access)
stack_21, Gamma_stack_21 := memory_store64_le(stack_21, bvadd64(R29, 18446744073709551600bv64), R1), gamma_store64(Gamma_stack_21, bvadd64(R29, 18446744073709551600bv64), Gamma_R1); 

// R31 + 16 (128-bit access)
stack_21, Gamma_stack_21 := memory_store128_le(stack_21, bvadd64(R31, 16bv64), V0), gamma_store128(Gamma_stack_21, bvadd64(R31, 16bv64), Gamma_V0); 

// R31 + 32 (64-bit access)
stack_25, Gamma_stack_25 := memory_store64_le(stack_25, bvadd64(R31, 32bv64), R8), gamma_store64(Gamma_stack_25, bvadd64(R31, 32bv64), Gamma_R8); 

// R31 + 16 (64-bit access)
R8, Gamma_R8 := memory_load64_le(stack_21, bvadd64(R31, 16bv64)), gamma_load64(Gamma_stack_21, bvadd64(R31, 16bv64)); 

// R31 + 24 (64-bit access)
R8, Gamma_R8 := memory_load64_le(stack_28, bvadd64(R31, 24bv64)), gamma_load64(Gamma_stack_28, bvadd64(R31, 24bv64)); 

// R31 + 32 (64-bit access)
R8, Gamma_R8 := memory_load64_le(stack_25, bvadd64(R31, 32bv64)), gamma_load64(Gamma_stack_25, bvadd64(R31, 32bv64)); 

// R31 + 12 (32-bit access)
R0, Gamma_R0 := zero_extend32_32(memory_load32_le(stack_15, bvadd64(R31, 12bv64))), gamma_load32(Gamma_stack_15, bvadd64(R31, 12bv64)); 
#6, Gamma_#6 := bvadd64(R31, 64bv64), Gamma_R31;
// R31 + 64 (64-bit access)
R29, Gamma_R29 := memory_load64_le(stack_25, #6), gamma_load64(Gamma_stack_25, #6); 
// R31 + 72 (64-bit access)
R30, Gamma_R30 := memory_load64_le(stack_25, bvadd64(#6, 8bv64)), gamma_load64(Gamma_stack_25, bvadd64(#6, 8bv64));

Each stack region is associated with the following accesses:

stack_15: R31 + 12 (32-bit access) stack_21: R31 + 48 (64-bit access), R31 + 16 (128-bit access), R31 + 16 (64-bit access) stack_25: R31 + 64 (64-bit access), R31 + 72 (64-bit access), R31 + 32 (64-bit access) stack_28: R31 + 60 (32-bit access), R31 + 56 (32-bit access), R31 + 24 (64-bit access)

It is incorrect that R31 + 16 (128-bit access) and R31 + 24 (64-bit access) are in different regions, despite being overlapping accesses - the analysis needs to consider the size of accesses as well as their address.

While the way the other regions are combined does not cause problems, it is imprecise and it would be ideal to fix it so that distinct accesses have distinct regions.

l-kent commented 3 weeks ago

Another issue is the regions in assertions, specifically the assertions that check if writes to memory are secure.

These are updates to the global variables z and x, respectively from the test correct/secret_write/clang, with annotations:

assert (L(z, bvadd64(R9, 52bv64)) ==> true); // checking that the memory store doesn't violate security properties
z_old := memory_load32_le(z, $z_addr); // getting current value of global variable z for comparison after the store
Gamma_x_old := (gamma_load32(Gamma_z, $x_addr) || L(z, $x_addr)); // getting current value of security level of global variable x, but using the wrong region
z, Gamma_z := memory_store32_le(z, bvadd64(R9, 52bv64), 0bv32), gamma_store32(Gamma_z, bvadd64(R9, 52bv64), true); // writing to global variable z
assert ((bvadd64(R9, 52bv64) == $z_addr) ==> (L(z, $x_addr) ==> Gamma_x_old)); // checking that if this memory store was to global variable z then it hasn't violated the security policy for global variable x, which depends on global variable z, but using the wrong region as input to the L function
assert bvsge32(memory_load32_le(mem, $z_addr), z_old); // checking the guarantee, that the global variable z is only increasing, but using the wrong region

assert (L(x, bvadd64(R8, 60bv64)) ==> Gamma_R10);
z_old := memory_load32_le(x, $z_addr); // storing current value of global variable z for comparison after the store, but using incorrect region 
Gamma_x_old := (gamma_load32(Gamma_x, $x_addr) || L(x, $x_addr));  // getting current value of security level of global variable x
x, Gamma_x := memory_store32_le(x, bvadd64(R8, 60bv64), R10[32:0]), gamma_store32(Gamma_x, bvadd64(R8, 60bv64), Gamma_R10); // writing to global variable x
assert ((bvadd64(R8, 60bv64) == $z_addr) ==> (L(x, $x_addr) ==> Gamma_x_old)); // checking that if this memory store was to global variable z then it hasn't violated the security policy for global variable x, which depends on global variable z. This assertion is now redundant since we have determined that this memory store was to the region x, which does not contain the global variable z which this assertion is relevant for. 
assert bvsge32(memory_load32_le(mem, $z_addr), z_old); // checking the guarantee, that the global variable z is only increasing, but using the wrong region. This assertion is now completely redundant since we have determined that this is a write to the region x which does not contain the global variable z which this assertion is relevant for.

There are number of issues we need to address here. Most straight-forwardly - there is still a reference to mem in the last assertion generated, which checks the user-specified guarantee. We need to make sure all accesses to global variables use the correct region for each global variable.

The more difficult issue is that we need to redesign this secure update check to take into account that not all global variables share the same region now, which means many checks can be removed.

We want to change the output so that it now looks something like this:

assert (L(z, bvadd64(R9, 52bv64)) ==> true);
z_old := memory_load32_le(z, $z_addr);
Gamma_x_old := (gamma_load32(Gamma_x, $x_addr) || L(x, $x_addr));
z, Gamma_z := memory_store32_le(z, bvadd64(R9, 52bv64), 0bv32), gamma_store32(Gamma_z, bvadd64(R9, 52bv64), true);
assert ((bvadd64(R9, 52bv64) == $z_addr) ==> (L(x, $x_addr) ==> Gamma_x_old));
assert bvsge32(memory_load32_le(z, $z_addr), z_old);

assert (L(x, bvadd64(R8, 60bv64)) ==> Gamma_R10);
x, Gamma_x := memory_store32_le(x, bvadd64(R8, 60bv64), R10[32:0]), gamma_store32(Gamma_x, bvadd64(R8, 60bv64), Gamma_R10); // writing to global variable x

The differences are:

the correct regions are used in relation to global variables
assertions for guarantees are only generated if a global variable referred to in a guarantee is contained in a region accessed by a memory store
secure update assertions are only generated for a control variable (in this case, z is the control variable) if a region containing that control variable is accessed by a memory store

It may also make sense to split the L function so that each region has its own L function? I'm not sure if this matters at all either way though.

l-kent commented 3 weeks ago

I think that should be pretty much everything. I realise this is a lot. I think the most important issues to focus on at first are the ones in this comment: https://github.com/UQ-PAC/BASIL/pull/231#issuecomment-2299908156 since those are issues with the analysis itself, rather than just more work that needs to be done with the analysis results.

l-kent commented 3 weeks ago

Another issue: the stack identification doesn't seem to work correctly when heap pointers are involved.

This is from test/correct/malloc_with_local/gcc/malloc_with_local.bir, with annotations:

0000036d: call @malloc with return %0000036f // call malloc, putting fresh heap pointer in R0

0000036f:
00000375: mem := mem with [R31 + 0x20, el]:u64 <- R0 // store fresh heap pointer on stack at R31 + 0x20
0000037a: R0 := 4
0000037f: R30 := 0x830
00000381: call @malloc with return %00000383 // call malloc, putting a second fresh heap pointer in R0

00000383:
00000389: mem := mem with [R31 + 0x28, el]:u64 <- R0 // store second heap pointer on stack at R31 + 0x28
0000038e: R0 := 0xA
00000396: mem := mem with [R31 + 0x1C, el]:u32 <- 31:0[R0] 
0000039d: R0 := mem[R31 + 0x20, el]:u64 // load first heap pointer from stack to R0
000003a2: R1 := 0x41
000003aa: mem := mem with [R0] <- 7:0[R1] // write 0x41 to first heap location, which R0 points to
000003b1: R0 := mem[R31 + 0x28, el]:u64 // load second heap pointer from stack to R0
000003b6: R1 := 0x2A
000003be: mem := mem with [R0, el]:u32 <- 31:0[R1] // write 0x2A to second heap location, which R0 points to
000003c5: R0 := mem[R31 + 0x20, el]:u64 // load first heap pointer from stack to R0
000003cc: R0 := pad:64[mem[R0]] // load value contained in first heap location to R0

The corresponding Boogie produced is the following (with annotations and cleaned up a little):

  // call malloc, putting fresh heap pointer in R0
  call malloc();  
  goto l0000036f;
l0000036f:
  // store fresh heap pointer on stack at R31 + 32 (== 0x20)
  stack_15, Gamma_stack_15 := memory_store64_le(stack_15, bvadd64(R31, 32bv64), R0), gamma_store64(Gamma_stack_15, bvadd64(R31, 32bv64), Gamma_R0);
  R0, Gamma_R0 := 4bv64, true;
  R30, Gamma_R30 := 2096bv64, true;
  // call malloc, putting a second fresh heap pointer in R0
  call malloc();
  goto l00000383;
l00000383:
  // store second heap pointer on stack at R31 + 40 (== 0x28) 
  stack_18, Gamma_stack_18 := memory_store64_le(stack_18, bvadd64(R31, 40bv64), R0), gamma_store64(Gamma_stack_18, bvadd64(R31, 40bv64), Gamma_R0); 
  R0, Gamma_R0 := 10bv64, true;
  stack_20, Gamma_stack_20 := memory_store32_le(stack_20, bvadd64(R31, 28bv64), R0[32:0]), gamma_store32(Gamma_stack_20, bvadd64(R31, 28bv64), Gamma_R0);
  // load first heap pointer from stack to R0
  R0, Gamma_R0 := memory_load64_le(stack_15, bvadd64(R31, 32bv64)), gamma_load64(Gamma_stack_15, bvadd64(R31, 32bv64)); 
  R1, Gamma_R1 := 65bv64, true;
  // write 65 (== 0x41) to first heap location, which R0 points to - incorrectly uses region 'stack_15' instead of the first heap region
  stack_15, Gamma_stack_15 := memory_store8_le(stack_15, R0, R1[8:0]), gamma_store8(Gamma_stack_15, R0, Gamma_R1);
  // load second heap pointer from stack to R0
  R0, Gamma_R0 := memory_load64_le(stack_18, bvadd64(R31, 40bv64)), gamma_load64(Gamma_stack_18, bvadd64(R31, 40bv64));
  R1, Gamma_R1 := 42bv64, true;
  // write 42 (== 0x2A) to second heap location, which R0 points to  - incorrectly uses region 'stack_18' instead of the second heap region
  stack_18, Gamma_stack_18 := memory_store32_le(stack_18, R0, R1[32:0]), gamma_store32(Gamma_stack_18, R0, Gamma_R1);
  // load first heap pointer from stack to R0
  R0, Gamma_R0 := memory_load64_le(stack_15, bvadd64(R31, 32bv64)), gamma_load64(Gamma_stack_15, bvadd64(R31, 32bv64));
  // load value contained in first heap location to R0 - incorrectly uses region 'stack_15' instead of the first heap region
  R0, Gamma_R0 := zero_extend56_8(memory_load8_le(stack_15, R0)), gamma_load8(Gamma_stack_15, R0);

The big issue is that accesses to the heap are incorrectly treated as accesses to the stack regions in which the heap pointers are stored. Every heap allocation should have its own region and should not be incorrectly identified as the stack.

l-kent commented 3 weeks ago

If it would help, I am happy to work on all the bits that relate to getting the new regions to be properly used in the Boogie output - everything except the bugs in the analyses. I have a good understanding of those parts of the code so that would speed things up.

yousifpatti commented 3 weeks ago

If it would help, I am happy to work on all the bits that relate to getting the new regions to be properly used in the Boogie output - everything except the bugs in the analyses. I have a good understanding of those parts of the code so that would speed things up.

That would be good if you could please. I will have a look at the bugs in the analyses. Thanks Liam