Verifying C written with C11 features using SAW

pennyannn commented 3 months ago

When migrating AWS-LC to C11, all of our existing proofs failed with the following error at LLVM module loading step:

Mismatched value types:
cmp value: ValIdent (Ident "4")
new value: ValIdent (Ident "7")
cmp type:  PrimType (Integer 32)
new type:  PrimType (Integer 32)
from:
    FUNC_CODE_INST_CMPXCHG
    @CRYPTO_refcount_inc
    FUNCTION_BLOCK
    FUNCTION_BLOCK_ID
    value symbol table
    MODULE_BLOCK
    Bitstream

The function CRYPTO_refcount_inc is defined at https://github.com/aws/aws-lc/blob/main/crypto/refcount_lock.c#L29 for C99 and defined at https://github.com/aws/aws-lc/blob/main/crypto/refcount_c11.c#L37 for C11. Note that the C11 version is using the new atomic feature. I'm wondering if this means that SAW does not support new features in C11, therefore the LLVM loader couldn't understand the syntax. Any recommendations for how to get around this problem?

RyanGlScott commented 3 months ago

What versions of SAW and LLVM are you using? I ask since this looks a lot like the bug reported in https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/250, where the LLVM bitcode parser's treatment of the the cmpxchg instruction was incorrect in the presence of opaque pointers (which are enabled by default in LLVM 15 or later). This bug has since been fixed in https://github.com/GaloisInc/llvm-pretty-bc-parser/pull/251 and is included as part of the SAW 1.1 release.

Generally speaking, SAW should support most C11 features, provided that they compile down to LLVM instructions that SAW knows how to reason about. That's usually the case.

RyanGlScott commented 3 months ago

By the way, here is a test case (distilled from https://github.com/GaloisInc/llvm-pretty-bc-parser/pull/251) that you can use to easily tell whether your version of SAW is affected by https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/250 or not:

// test.c
#include <stdint.h>
#include <stdatomic.h>

atomic_int val = 0x928;

int do_atomic_update(atomic_int newval) {
    int old_val = 0x928;
    return atomic_compare_exchange_weak(&val, &old_val, newval);
}

// test.saw
let do_atomic_update_spec = do {
  llvm_alloc_global "val";
  llvm_points_to (llvm_global "val") (llvm_term {{ zext 0x928 : [32] }});
  newval <- llvm_fresh_var "newval" (llvm_int 32);

  llvm_execute_func [llvm_term newval];

  llvm_return (llvm_term {{ 1 : [32] }});
};

m <- llvm_load_module "test.bc";
llvm_verify m "do_atomic_update" [] false do_atomic_update_spec z3;

With SAW 1.1, this test case passes regardless of whether you use a version of LLVM does or does not require opaque pointers:

# LLVM 12 does not use opaque pointers
$ ~/Software/clang+llvm-12.0.0/bin/clang test.c -emit-llvm -g -c
$ ~/Software/saw-1.1/bin/saw test.saw 
[09:21:46.597] Loading file "/home/ryanscott/Documents/Hacking/SAW/test.saw"
[09:21:46.657] Verifying do_atomic_update ...
[09:21:46.658] Simulating do_atomic_update ...
[09:21:46.660] Checking proof obligations do_atomic_update ...
[09:21:46.660] Proof succeeded! do_atomic_update

# LLVM 16 does use opaque pointers
$ ~/Software/clang+llvm-16.0.4/bin/clang test.c -emit-llvm -g -c
$ ~/Software/saw-1.1/bin/saw test.saw 
[09:21:52.497] Loading file "/home/ryanscott/Documents/Hacking/SAW/test.saw"
[09:21:52.558] Verifying do_atomic_update ...
[09:21:52.559] Simulating do_atomic_update ...
[09:21:52.560] Checking proof obligations do_atomic_update ...
[09:21:52.560] Proof succeeded! do_atomic_update

pennyannn commented 2 months ago

Hi @RyanGlScott , thanks for mentioning this fix PR. It actually fixed the module loading issue, which is great. However, this means that I have to upgrade SAW in our CI and that hasn't been a pleasant experience. Above is the PR I worked on for fixing failures after upgrading SAW. There are a total of three lemmas failing. Two of them I was able to fix without too much trouble. There is one that revealed a change in SAW that caused the term to be much bigger than before, causing Z3 to fail to prove it.

The lemma is called verify_HMAC_Init_ex_array_spec. I looked into the generated term for that theorem using the old SAW and the upgraded SAW. I found that specifically, the subterm for slicing has changed. For example in old SAW, the following is the subterm:

 x@149 = arrayCopy 64 x@2 x@137 x@6 key#1595 x@6 key_len#1594
  x@184 = append 56 8 Bool
                (append 48 8 Bool
                   (append 40 8 Bool
                      (append 32 8 Bool
                         (append 24 8 Bool
                            (append 16 8 Bool
                               (append 8 8 Bool (arrayLookup x@1 x@2 x@149 x@10)
                                  (arrayLookup x@1 x@2 x@149 x@12))
                               (arrayLookup x@1 x@2 x@149 x@17))
                            (arrayLookup x@1 x@2 x@149 x@53))
                         (arrayLookup x@1 x@2 x@149 x@13))
                      (arrayLookup x@1 x@2 x@149 x@16))
                   (arrayLookup x@1 x@2 x@149 x@9))
                (arrayLookup x@1 x@2 x@149 x@6)

I suspect the arrayCopy is coming from OPENSSL_memcpy(key_block, key, key_len); in the C code https://github.com/aws/aws-lc/blob/main/crypto/fipsmodule/hmac/hmac.c#L399, and the slicing is coming from key_block[i] in https://github.com/aws/aws-lc/blob/main/crypto/fipsmodule/hmac/hmac.c#L402. Seems to make sense. However, using the upgraded SAW, I get this:

  x@283 = ite x@1 x@273 x@14
                (ite x@1 x@274 (append 56 8 Bool x@143 x@153)
                   (ite x@1 x@275
                      (append 48 16 Bool x@276 (append 8 8 Bool x@154 x@153))
                      (ite x@1 x@277
                         (append 40 24 Bool x@278
                            (append 16 8 Bool (append 8 8 Bool x@155 x@154) x@153))
                         (ite x@1 x@279
                            (append 32 32 Bool x@145
                               (append 24 8 Bool
                                  (append 16 8 Bool (append 8 8 Bool x@156 x@155) x@154)
                                  x@153))
                            (ite x@1 x@280
                               (append 24 40 Bool x@144
                                  (append 32 8 Bool
                                     (append 24 8 Bool
                                        (append 16 8 Bool (append 8 8 Bool x@157 x@156) x@155)
                                        x@154)
                                     x@153))
                               (ite x@1 x@281
                                  (append 16 48 Bool x@152
                                     (append 40 8 Bool
                                        (append 32 8 Bool
                                           (append 24 8 Bool
                                              (append 16 8 Bool (append 8 8 Bool x@158 x@157) x@156)
                                              x@155)
                                           x@154)
                                        x@153))
                                  (ite x@1 x@282
                                     (append 8 56 Bool x@147
                                        (append 48 8 Bool
                                           (append 40 8 Bool
                                              (append 32 8 Bool
                                                 (append 24 8 Bool
                                                    (append 16 8 Bool (append 8 8 Bool x@159 x@158) x@157)
                                                    x@156)
                                                 x@155)
                                              x@154)
                                           x@153))
                                     (append 56 8 Bool
                                        (append 48 8 Bool
                                           (append 40 8 Bool
                                              (append 32 8 Bool
                                                 (append 24 8 Bool
                                                    (append 16 8 Bool
                                                       (append 8 8 Bool (arrayLookup x@1 x@3 key#1484 x@18) x@159)
                                                       x@158)
                                                    x@157)
                                                 x@156)
                                              x@155)
                                           x@154)
                                        x@153))))))))

The if conditions are key_len = 0, key_len = 1, ..., which breaks key_len into cases. for each case it tries to assemble the bytes from key back. I'm sorry if these terms are too dense and doesn't make sense to you.

I'm simply wondering if this kind of change is something you could directly recognize. Do you have any suggestions on what I could do? Is there some option that I can use to achieve the old version which is more pleasant to the eyes and easier to prove?

pennyannn commented 2 months ago

Hi @RyanGlScott, any ideas for how to resolve this issue?

RyanGlScott commented 2 months ago

Sorry for taking so long to reply.

I don't immediately recognize what is going on, but I suspect that changing the Clang compiler flags in https://github.com/awslabs/aws-lc-verification/pull/162 may cause the LLVM bitcode to be different enough to matter. I'd have to take a closer look to see if there are finer-grained Clang flags that would allow restoring the old behavior.

pennyannn commented 2 months ago

Thanks! I would be really surprised if it's the compiler flag causing this issue but I could double check the LLVM generated. Btw, it sounds like there is no change in SAW that would lead to this behaviour?

RyanGlScott commented 2 months ago

No, there hasn't been any deliberate changes on SAW's end (to my knowledge) that would affect the shape of proof goals in such a noticeable way. My understanding is that CMake passes different flags to Clang depending on whether you are compiling for C99 versus C11, so any information you can provide on how your setup might affect the resulting calls to Clang would be helpful.

pennyannn commented 2 months ago

Thanks for the confirmation, I will look into it. It's unfortunate that it is very easy to get caught off-guard by compilation instability.

GaloisInc / saw-script

Verifying C written with C11 features using SAW #2099