GaloisInc / saw-script

The SAW scripting language.
BSD 3-Clause "New" or "Revised" License
437 stars 63 forks source link

Verifying C written with C11 features using SAW #2099

Open pennyannn opened 3 weeks ago

pennyannn commented 3 weeks ago

When migrating AWS-LC to C11, all of our existing proofs failed with the following error at LLVM module loading step:

Mismatched value types:
cmp value: ValIdent (Ident "4")
new value: ValIdent (Ident "7")
cmp type:  PrimType (Integer 32)
new type:  PrimType (Integer 32)
from:
    FUNC_CODE_INST_CMPXCHG
    @CRYPTO_refcount_inc
    FUNCTION_BLOCK
    FUNCTION_BLOCK_ID
    value symbol table
    MODULE_BLOCK
    Bitstream

The function CRYPTO_refcount_inc is defined at https://github.com/aws/aws-lc/blob/main/crypto/refcount_lock.c#L29 for C99 and defined at https://github.com/aws/aws-lc/blob/main/crypto/refcount_c11.c#L37 for C11. Note that the C11 version is using the new atomic feature. I'm wondering if this means that SAW does not support new features in C11, therefore the LLVM loader couldn't understand the syntax. Any recommendations for how to get around this problem?

RyanGlScott commented 3 weeks ago

What versions of SAW and LLVM are you using? I ask since this looks a lot like the bug reported in https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/250, where the LLVM bitcode parser's treatment of the the cmpxchg instruction was incorrect in the presence of opaque pointers (which are enabled by default in LLVM 15 or later). This bug has since been fixed in https://github.com/GaloisInc/llvm-pretty-bc-parser/pull/251 and is included as part of the SAW 1.1 release.

Generally speaking, SAW should support most C11 features, provided that they compile down to LLVM instructions that SAW knows how to reason about. That's usually the case.

RyanGlScott commented 3 weeks ago

By the way, here is a test case (distilled from https://github.com/GaloisInc/llvm-pretty-bc-parser/pull/251) that you can use to easily tell whether your version of SAW is affected by https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/250 or not:

// test.c
#include <stdint.h>
#include <stdatomic.h>

atomic_int val = 0x928;

int do_atomic_update(atomic_int newval) {
    int old_val = 0x928;
    return atomic_compare_exchange_weak(&val, &old_val, newval);
}
// test.saw
let do_atomic_update_spec = do {
  llvm_alloc_global "val";
  llvm_points_to (llvm_global "val") (llvm_term {{ zext 0x928 : [32] }});
  newval <- llvm_fresh_var "newval" (llvm_int 32);

  llvm_execute_func [llvm_term newval];

  llvm_return (llvm_term {{ 1 : [32] }});
};

m <- llvm_load_module "test.bc";
llvm_verify m "do_atomic_update" [] false do_atomic_update_spec z3;

With SAW 1.1, this test case passes regardless of whether you use a version of LLVM does or does not require opaque pointers:

# LLVM 12 does not use opaque pointers
$ ~/Software/clang+llvm-12.0.0/bin/clang test.c -emit-llvm -g -c
$ ~/Software/saw-1.1/bin/saw test.saw 
[09:21:46.597] Loading file "/home/ryanscott/Documents/Hacking/SAW/test.saw"
[09:21:46.657] Verifying do_atomic_update ...
[09:21:46.658] Simulating do_atomic_update ...
[09:21:46.660] Checking proof obligations do_atomic_update ...
[09:21:46.660] Proof succeeded! do_atomic_update

# LLVM 16 does use opaque pointers
$ ~/Software/clang+llvm-16.0.4/bin/clang test.c -emit-llvm -g -c
$ ~/Software/saw-1.1/bin/saw test.saw 
[09:21:52.497] Loading file "/home/ryanscott/Documents/Hacking/SAW/test.saw"
[09:21:52.558] Verifying do_atomic_update ...
[09:21:52.559] Simulating do_atomic_update ...
[09:21:52.560] Checking proof obligations do_atomic_update ...
[09:21:52.560] Proof succeeded! do_atomic_update
pennyannn commented 2 weeks ago

Hi @RyanGlScott , thanks for mentioning this fix PR. It actually fixed the module loading issue, which is great. However, this means that I have to upgrade SAW in our CI and that hasn't been a pleasant experience. Above is the PR I worked on for fixing failures after upgrading SAW. There are a total of three lemmas failing. Two of them I was able to fix without too much trouble. There is one that revealed a change in SAW that caused the term to be much bigger than before, causing Z3 to fail to prove it.

The lemma is called verify_HMAC_Init_ex_array_spec. I looked into the generated term for that theorem using the old SAW and the upgraded SAW. I found that specifically, the subterm for slicing has changed. For example in old SAW, the following is the subterm:

 x@149 = arrayCopy 64 x@2 x@137 x@6 key#1595 x@6 key_len#1594
  x@184 = append 56 8 Bool
                (append 48 8 Bool
                   (append 40 8 Bool
                      (append 32 8 Bool
                         (append 24 8 Bool
                            (append 16 8 Bool
                               (append 8 8 Bool (arrayLookup x@1 x@2 x@149 x@10)
                                  (arrayLookup x@1 x@2 x@149 x@12))
                               (arrayLookup x@1 x@2 x@149 x@17))
                            (arrayLookup x@1 x@2 x@149 x@53))
                         (arrayLookup x@1 x@2 x@149 x@13))
                      (arrayLookup x@1 x@2 x@149 x@16))
                   (arrayLookup x@1 x@2 x@149 x@9))
                (arrayLookup x@1 x@2 x@149 x@6)

I suspect the arrayCopy is coming from OPENSSL_memcpy(key_block, key, key_len); in the C code https://github.com/aws/aws-lc/blob/main/crypto/fipsmodule/hmac/hmac.c#L399, and the slicing is coming from key_block[i] in https://github.com/aws/aws-lc/blob/main/crypto/fipsmodule/hmac/hmac.c#L402. Seems to make sense. However, using the upgraded SAW, I get this:

  x@283 = ite x@1 x@273 x@14
                (ite x@1 x@274 (append 56 8 Bool x@143 x@153)
                   (ite x@1 x@275
                      (append 48 16 Bool x@276 (append 8 8 Bool x@154 x@153))
                      (ite x@1 x@277
                         (append 40 24 Bool x@278
                            (append 16 8 Bool (append 8 8 Bool x@155 x@154) x@153))
                         (ite x@1 x@279
                            (append 32 32 Bool x@145
                               (append 24 8 Bool
                                  (append 16 8 Bool (append 8 8 Bool x@156 x@155) x@154)
                                  x@153))
                            (ite x@1 x@280
                               (append 24 40 Bool x@144
                                  (append 32 8 Bool
                                     (append 24 8 Bool
                                        (append 16 8 Bool (append 8 8 Bool x@157 x@156) x@155)
                                        x@154)
                                     x@153))
                               (ite x@1 x@281
                                  (append 16 48 Bool x@152
                                     (append 40 8 Bool
                                        (append 32 8 Bool
                                           (append 24 8 Bool
                                              (append 16 8 Bool (append 8 8 Bool x@158 x@157) x@156)
                                              x@155)
                                           x@154)
                                        x@153))
                                  (ite x@1 x@282
                                     (append 8 56 Bool x@147
                                        (append 48 8 Bool
                                           (append 40 8 Bool
                                              (append 32 8 Bool
                                                 (append 24 8 Bool
                                                    (append 16 8 Bool (append 8 8 Bool x@159 x@158) x@157)
                                                    x@156)
                                                 x@155)
                                              x@154)
                                           x@153))
                                     (append 56 8 Bool
                                        (append 48 8 Bool
                                           (append 40 8 Bool
                                              (append 32 8 Bool
                                                 (append 24 8 Bool
                                                    (append 16 8 Bool
                                                       (append 8 8 Bool (arrayLookup x@1 x@3 key#1484 x@18) x@159)
                                                       x@158)
                                                    x@157)
                                                 x@156)
                                              x@155)
                                           x@154)
                                        x@153))))))))

The if conditions are key_len = 0, key_len = 1, ..., which breaks key_len into cases. for each case it tries to assemble the bytes from key back. I'm sorry if these terms are too dense and doesn't make sense to you.

I'm simply wondering if this kind of change is something you could directly recognize. Do you have any suggestions on what I could do? Is there some option that I can use to achieve the old version which is more pleasant to the eyes and easier to prove?