Unsoundness of OctApron (& other results from a run)

michael-schwarz commented 3 years ago

I ran SV-Comp with OctApron, for the following tests we reported true contrary to how the task is labelled:

[x] bitvector-regression/implicitunsignedconversion-1.yml
[x] bitvector-regression/signextension-1.yml
[x] bitvector-regression/signextension2-2.yml
[x] ldv-linux-3.7.3/main15_drivers-usb-core-usbcore-ko--32_7a--linux-3.7.3.yml

The base analysis does not have that problem.

Apart from that the changes are:

14 tasks where OctApron+Base succeed where Base alone doesn't :tada:
1 additional case where we fail during verify
29 APRON Manager Exception
404 tasks with Exception(Failure)
3 FAILED
32 new OUT_OF_MEMORY (where runtime for base was minimal)
(59 tests that work with base only timeout now)

I'll link to the result in a comment on Slack, GitHub doesn't let me add HTML files.

sim642 commented 3 years ago

From the results table, this is with octApron that's currently on master (https://github.com/goblint/analyzer/commit/a06a611c), not traces-relational.

Apparently we have an octApron regression test that doesn't actually work with octApron, but is based on signextension-1: https://github.com/goblint/analyzer/blob/847352e74a1f388f5a23074dd9bd16c47f85c547/tests/regression/36-octapron/08-problem-signed-unsigned-overflow.c.

michael-schwarz commented 3 years ago

From the results table, this is with octApron that's currently on master (a06a611), not traces-relational.

Yes, probably easier to first fix these bugs as I guess they are easier to debug than those that depend on privatization.

Those that fail with Exception(Failure) are also related to octApron directly it seems; Fatal error: exception Failure("Environment.dim_of_var: unknown variable in the environment")

sim642 commented 3 years ago

I looked at the first Apron Manager exception (missing variable in environment) one and it boils down to this:

double copysign_double(double x, double y) {
  return x;
}

int main() {
  // oct is top here, so entry state of copysign_double is also top (doesn't contain x')
  copysign_double(0, 0); // works when combine has no lval
  // double z = copysign_double(0, 0); // doesn't work when combine has lval
}

Many of the Failure exceptions might be for related reasons, just on different operations, which raise a different exception.

The analysis (already from poly) contains some logic for function calls, which introduces primed variables for formal arguments and tries to relate the return value to those I guess. I'm not fully sure what the logic here is or if it works, so I wanted @ivanazuzic to look into it. It seems like some kind of transformer for function summarization?

What's also a bit unusual about function calls is that the contexts are all or nothing: https://github.com/goblint/analyzer/blob/cdbfd67d59c4112f8f09bd22b1f4d31e37432aac/src/analyses/apron/octApron.apron.ml#L19 Not sure how this fits into the rest, because my intuition for them would be to just remove the caller's locals and pass on the rest. This likely needs to be looked over, because for traces-relational, we would want to pass the Priv module state through functions.

sim642 commented 3 years ago

Also most of these exceptions seem to come from floating point tasks, which we really don't care about right now (although these function call things might also pop up for integers I suppose?). Currently octApron does some stuff with floating point (because Apron supports it), but I think we also disabled it via option (maybe that's the cause for the exceptions).

In traces-relational I actually totally ignore (and probably break) floating point variables right now, because they're annoying to deal with (Apron environments keep separate int and float vars). I'm tempted to say we should completely rip the float vars support out of it to simplify things. Especially since the description of octApron explicitly mentions just integers: https://github.com/goblint/analyzer/blob/cdbfd67d59c4112f8f09bd22b1f4d31e37432aac/src/analyses/apron/octApron.apron.ml#L1

michael-schwarz commented 3 years ago

I'm tempted to say we should completely rip the float vars support out of it to simplify things.

At least for now, I 100% agree. Looking towards SVCOMP'22, we might want to add some support back later.

sim642 commented 3 years ago

At least for now, I 100% agree. Looking towards SVCOMP'22, we might want to add some support back later.

Sure, but such support would probably have to start with the non-relational support in base anyway. Especially since Apron doesn't seem to be aware of NaN possibility, which is why we disabled floats anyway.

sim642 commented 3 years ago

It's possible some issues are already fixed on traces-relational as well (e.g., https://github.com/goblint/analyzer/commit/749491625a474638297904b9886f7744c2f7ac48), so when trying to fix these, it makes sense to try that branch as well if there's already a partial fix to cherry-pick.

ivanazuzic commented 3 years ago

What's also a bit unusual about function calls is that the contexts are all or nothing: https://github.com/goblint/analyzer/blob/cdbfd67d59c4112f8f09bd22b1f4d31e37432aac/src/analyses/apron/octApron.apron.ml#L19

That line about context was changed by Ralf here Context used to be bottom and he made it depend on exp.full-context.

sim642 commented 3 years ago

I guess it might be fine. The consequence is that if a function is called with different arguments x < y and x > y, then they won't get separate contexts (both use bottom octagon), so the cases get joined together. It might matter in some case?

sim642 commented 3 years ago

Anyway, I think most of the Apron manager exceptions (and maybe Failure exceptions) might be gone now after #253.

sim642 commented 3 years ago

I spent the better part of Friday looking at why this super simple benchmark https://github.com/sosy-lab/sv-benchmarks/blob/99d37c5b4072891803b9e5c154127c912477f705/c/loop-acceleration/simple_2-1.c#L13-L21 went from 0.0785s to timing out after 60s. It turned out to be a pretty wild ride.

I quickly realized that this timeout with octApron only happened in SV-COMP mode, which was a bad sign. Tracing the non-terminating analysis revealed that the wonderful dependency sets in PathSensitive3 were just growing and growing, by having seemingly duplicate values in a (Hoare) set. So I fully expected this polymorphic compare of OctApronDomain to be the issue: https://github.com/goblint/analyzer/blob/b5e988d75476dc8316bdcec7d7a1cdebb402f7b5/src/cdomains/apron/octApronDomain.apron.ml#L101 And there's no A.compare from Apron...

So I tried some weird implementations for it like:

let compare x y =
  if equal x y then
    0
  else
    Stdlib.compare x y

And it didn't change a thing, even though equal is properly implemented there (including for our special bot and top, which the problematic duplicate value contained). This made no sense (the duplicate elements should have collapsed by this), so in a moment of desperation I changed the implementation to failwith "asd"... and it never raised the exception despite the fact that PathSensitive3 clearly is relying on compare!

Then I tried disabling the HashconsLifter andHashconsContextLifter and noticed that it only reached the failure when I disabled at least one of the two lifters. This made sense because hashconsing redefines compare using its own unique tags, so if both are hashconsed, then the inner compare doesn't really matter. As long as hash and equal behave correctly... and they did (both implemented using Apron's respective functions).

Hashconsing strikes again! So I used its equal_debug and saw this output about equal elements having different tags: https://github.com/goblint/analyzer/blob/b5e988d75476dc8316bdcec7d7a1cdebb402f7b5/src/domains/printable.ml#L155-L156 AFAIK this is something that's supposed to be related to loading the marshaled results for incremental analysis (and relift) but I was doing neither, so how could hashconsing be wrong then? I started to suspect the worst: the Weak memory management stuff inside hashconsing doesn't work properly with the memory management of Apron in C or something crazy like that.

I kept adding more and more debug prints to see what's going on around hashconsing and realized the following: the abstract value during compare was different from the abstract value that was initially lifted into hashconsing to get the tag. So the failing debug equal comparison saw two equal abstract values, but they weren't the abstract values that were lifted. The abstract value had been mutated in-place!

Eventually I found the place where this happened: https://github.com/goblint/analyzer/blob/b5e988d75476dc8316bdcec7d7a1cdebb402f7b5/src/cdomains/apron/octApronDomain.apron.ml#L567-L568 The _with version of the Apron function mutates the abstract value in place, which is here then also returned. But this also mutates the abstract value given as argument to this transfer function and that happened to be physically equal to the abstract value stored for hashconsing. So a follow-up assign caused the hashconsed value to change and break everything.

The issue is now fixed on master (https://github.com/goblint/analyzer/commit/15b7b8d741fdc742879f7b70aa9f7dd7f2207517) and that particular SV-COMP benchmark with octApron terminates properly now. So the moral of the story is: imperative programming is the root of all evil.

michael-schwarz commented 3 years ago

What is strange is that after the fixes introduced to master for octApron, the performance on sv-comp test decreased by quite a lot: 173 test where we had the correct verdict before now yield EXCEPTION (Apron.Manager.Error). :thinking:

On the bright side, the number of timeouts decreased by quite a bit.

A diff of a run without octagons, with the octagons in ga06a611c, and with the octagons on current master. comparison.zip

sim642 commented 3 years ago

What is strange is that after the fixes introduced to master for octApron, the performance on sv-comp test decreased by quite a lot: 173 test where we had the correct verdict before now yield EXCEPTION (Apron.Manager.Error). thinking

Thanks for doing an updated SV-COMP run! I'll look into those. I think it's possible that the mutation also caused unsoundness by changing the stored octagon at some other node and then didn't cover some possible abstract executions that reveal other problems with Apron environment mismatches.

EDIT: It seems to be another hashconsing-mutability issue somehow, because when __VERIFIER_nondet_uint is called, octApron gives it a top entry state in an empty environment, but when the side effect reaches the solver then the state is top over cond', which corresponds to a formal argument of __VERIFIER_assert.

EDIT 2: Ahh! The OctApronDomain has an identity crisis (pun intended!): equal is defined through is_top https://github.com/goblint/analyzer/blob/db84904d27d11c5a5d05183eea77c40785813c23/src/cdomains/apron/octApronDomain.apron.ml#L88-L93 which considers the top value with any environment to be top https://github.com/goblint/analyzer/blob/db84904d27d11c5a5d05183eea77c40785813c23/src/cdomains/apron/octApronDomain.apron.ml#L37-L40 but the actual top is defined to be top with the empty environment. These things don't match up.

So top (env: [||]) is considered equal to top (env: [|0> cond':int|]) by both equal (which I didn't change) and hash (which I changed to match equal, otherwise the hashing correctness condition is violated). Therefore hashconsing equates them and gives the value with the wrong environment in some places.

sim642 commented 3 years ago

@michael-schwarz Can you do an SV-COMP run from this branch: https://github.com/goblint/analyzer/tree/octapron-manager-error? I attempted to fix the weird implicitly lifted bot and top by giving them a special unique environment and (hopefully) making all comparisons sensitive to the environment.

michael-schwarz commented 3 years ago

Sure!

michael-schwarz commented 3 years ago

@sim642 This seems to resolve all these issues! comparison-fix-manager.zip

sim642 commented 3 years ago

Great! I merged the branch into master.

I noticed that besides the unsoundness issues mentioned in the issue, there's also a bunch of suspicious cases of TIMEOUTs turning into unknown, especially in float-newlib. My hope is that they're also doing some integer casting, which we analyze unsoundly right now, but we'll see if they remain after the unsoundness fixes.

sim642 commented 3 years ago

I just made a quick fix (on traces-relational) to the first three unsound cases by checking the ranges of cast types in octApron: https://github.com/goblint/analyzer/commit/4af0c8e9853681879a6e569f147ab882fc095cae.

The fourth one was already fixed on traces-relational. I've done so much refactoring and fixing of octApron there, that I probably fixed it indirectly at some point.

I'll keep these unsoundness fixes on traces-relational, but would soon also like to merge the current part of that branch back to master to prevent octApron from deviating so much.

sim642 commented 3 years ago

The surprising time improvement on float-newlib didn't change due to the casting fix, but still seemed suspicious. Indeed there was an unsound reason for it: at one point octApron got the ikind without unrolling the typedefs in the type. Having not done that, it fell into a no-op case, where an assignment (left-hand side invalidation) was ignored: https://github.com/goblint/analyzer/commit/d2b7948fce1e7bbad259d779151b72ae6f470c53?w=1.

sim642 commented 3 years ago

These should be fixed after merging #291.

goblint / analyzer

Unsoundness of OctApron (& other results from a run) #248