goblint / analyzer

Static analysis framework for C
https://goblint.in.tum.de
MIT License
184 stars 75 forks source link

Hoare powerset domain widening doesn't guarantee ascending chain condition and more #101

Open sim642 opened 4 years ago

sim642 commented 4 years ago

Links

In my attempt to understand the Hoare powerset domain (SetDomain.Hoare) used for path-sensitivity I happened upon the following:

Issue

Our current widening is implemented as follows: https://github.com/goblint/analyzer/blob/08316c5560b1c1298d8b0b29af4b076f12de0eb5/src/domains/setDomain.ml#L623-L628 This corresponds exactly to Definition 7 from the latter paper. However, it is only called an "extrapolation heuristics" and not "widening" because it doesn't guarantee the ascending chain condition. So we don't do true widening there.

Possible fix

The same paper also presents three different generic proper widening operators for such a domain using the inner domain's widening. Implementing any of them might not be straightforward though because they all require some kind of additional operator.

Merging in Hoare domains

The paper also discusses merging of elements in a Hoare set. Therefore such notion seems closely related but Goblint has multiple different Hoare powerset domains with quite different characteristics:

  1. Path-sensitivity lifter (PathSensitive2) domain is defined through SetDomain.Hoare with additional joining of elements wrapped around in join_reduce through should_joins from inner Specs.
  2. AddressSet domain is defined through SetDomain.HoarePO but the partitioning of elements into joined sets is deeply encoded into HoarePO through questionable means. It assumes the inner domain's operators (e.g. join) only act on elements which should be joined and raise Lattice.Incomparable on anything else. From what I understand, this just encodes a should_join-like query through exceptions while making no sense as a lattice (two elements can always be joined).
  3. EDIT: PartitionDomain.Set and PartitionDomain.Make also use Hoare ordering and seem to use a special collapse function to do joining. This ends up in region analysis domain.
  4. EDIT: PartitionDomain.SetSet seems to largely duplicate the behavior of the above partition domains although doesn't explicitly contain a merging function (or it is implicit). This ends up in vareq analysis domain.
  5. EDIT: SetDomain.SensitiveConf is some old and unused path-sensitivity domain that also uses Hoare ordering.
  6. EDIT: Historically PathSensitive2 was directly implemented as a Hoare-like domain before 5f5d8f88b1c8a08c6fb82ed79032d5f409edf415.

If the should_join-like operator is passed to a hoare powerset domain functor and the merging of elements by equivalence is moved to a single place, a single implementation should do. It would remove the need for exception-based control flow hacks while allowing an optimized bucket-based Hoare domain implementation to still be used.

vogler commented 4 years ago

From what I understand, this just encodes a should_join-like query through exceptions while making no sense as a lattice (two elements can always be joined).

Yes, that's why it's called PO, but I agree the naming is very confusing since it's more than a partial order. Exceptions seemed the easiest way to implement it. The better alternative was some abstract compare (instead of a should_join) like hinted at here (but seemed like too much work at the time): https://github.com/goblint/analyzer/blob/08316c5560b1c1298d8b0b29af4b076f12de0eb5/src/domains/lattice.ml#L8

It would remove the need for exception-based control flow hacks while allowing an optimized bucket-based Hoare domain implementation to still be used.

SetDomain.HoarePO is only efficient because it relies on the hashes of comparable elements to be the same. I kept SetDomain.Hoare for PathSenstive2 since this does not hold in general. I haven't compared it with HoarePO, but would assume it's more expensive than Hoare if there's no sharing of buckets.

sim642 commented 3 years ago

Over time I've edited the issue and collected more domains that duplicate the Hoare ordering and operations to varying extents. I've been thinking hard about what a good (and not any less efficient) generalization of them would be. Below is just a long dump of things that I've realized. It might not be legible to anyone but me though...

AddressSet

Instead of distributing elements into buckets by hash, we can think of it as some function p: elt -> P, which assigns an abstract partition to each element. Using hash and int for it is just a concrete instance (at least if we ignore hash collisions for the time being, but that's just due to the lack of a less expressive choice of partition identifiers). One can also think of a corresponding should_join: elt -> elt -> bool operator, which is just induced by the equality of partitions given by p. Of course that's a less efficient view, but at least provides some frame of reference when comparing to the others.

Elements in the same partition are comparable and the maximal element from the comparable ones remains, as is usual for simplification of Hoare powersets. It's just a bit less obvious because the join operator on addresses does both the comparability check and the joining, without having to leq them first to find maximal elements (like the set-based Hoare domain does).

The interesting consequence of this is that there's actually no forceful joining of non-leq elements, but all the element joins are naturally to keep maximals. Moreover, the partitioning actually serves to keep the elements apart.

PathSensitive2

Here the Hoare set keeps maximal elements, but also joins elements which should_join. Notably, the latter happens even if they're not in leq relation. So the should_join partitioning serves to merge elements together, not keep them apart. Another unique behavior here is that the maximality may join (or really just remove the lesser) elements which aren't related by should_join.

As mentioned, should_join doesn't make it possible to do such bucketing for paths. When actually looking at implementations of should_join, they're all based on D.equal (or true in trivial cases). This means that the path sensitivity could instead be defined by a projection function, which projects out the component(s) whose equality is equivalent to the existing should_join. So looks like it should be possible to define it as the suitable p to be used for buckets.

But that's actually not true due to the aforementioned property of maximality joins possibly going across these partitions. This is also very clearly possible in the implementation: when the Hoare set does reduce it doesn't even know about should_join. So the reduction may cause one partition (bucket) to disappear because it's subsumed by another. Not sure whether this behavior is necessary for something or just accidental. And then the join_reduce logic which is based on should_join, doesn't care about maximality again.

I was then tempted to think that the maximality-keeping should also happen on the partition components. In the AddressSet situation there's no one bucket by nature, which is leq of another, so nothing would change there. But that's not true either because the non-path-sensitive part of the domain might be in opposite leq order. So actually I now suspect that such projection-based path-sensitivity could be possible as a Map, but it needs to be some generalized version of HoareMap. Then the notion of maximality on entire paths (not just the path-sensitive components) gets less obvious though, since things get split up.

PartitionDomain

These are domains whose elements themselves are partitionings (sets of sets) which are ordered by Hoare as well. But this standard notion of partition isn't related to the above discussion on partitioning elements into buckets for better efficiency. So the behavior of keeping maximal elements isn't what's intuitively happening here.

Yet again, these domains contain a forceful joining feature as collapse. This one defined truly in a binary operator manner by looking for one common element. Hence here no direct projection analogue is possible. It just shows that the binary notion of merging elements in these domains still exists, regardless of the clever bucket-partitioning scheme.

What confuses me now though is that at one point I did some experimental coding and managed to redefine both AddressSet and PathSensitive2 via a slightly generalized PartitionDomain, which supported both collapsing and had a functor for optimization by giving a projection function as well (for address sets). Having looked so deeply at the other above, I'm not sure anymore whether it really worked correctly though or changed any of the subtle behaviors.

SensitiveConf

This domain is more like what I proposed for PathSensitive2 above: it's a set of pairs, where each pair has a path-sensitive component and a non-path-sensitive component. And the values are forcefully joined whenever path-sensitive components equal. I haven't dug into the git history, but I assume this is how path-sensitivity was done before "2".

But this one doesn't consider leq (or maximality) between path-sensitive components either, just equality. This probably simplifies it's logic compare to some HoareMap-like domain, but as mentioned above, doesn't exactly match the current PathSensitive2. This makes me think that the maximality of path-sensitive components is an unintended behavior introduced by switching from an explicit pair to should_join and maximality. It's not obvious though which of the two behaviors is "right".

Funnily enough, its code refers to my HoareMap idea already:

The meet operation is slightly different from the above, I think this is the right thing, the intuition is from thinking of this as a MapBot

The implementation just doesn't represent it as a map, but keeps the less efficient set-of-pairs construction.

sim642 commented 2 years ago

I got thinking about these things again to try to figure out the right generalization that captures both AddressDomain and PathSensitive2. As always happens when I touch this topic, I managed to find weird behavior of both on these examples: https://github.com/goblint/analyzer/commit/768b4fbac03c1488d5879565ab58af5f93fc0c61:

  1. As also mentioned above, PathSensitive2's Hoare normalization (reduction) is too aggressive and also acts over paths (joining values where should_join x y = false).
  2. AddressDomain (with HoarePO underneath) isn't even a Hoare domain: the leq on its elements happens on offsets, so &arr[1] and &arr[2] are incomparable (neither subsumes the other), so Hoare normalization should keep both as maximal elements, but they are aggressively joined instead.
michael-schwarz commented 2 years ago

the leq on its elements happens on offsets, so &arr[1] and &arr[2] are incomparable (neither subsumes the other), so Hoare normalization should keep both as maximal elements, but they are aggressively joined instead.

Yes, but this on purpose here!

sim642 commented 2 years ago

Yes, but this on purpose here!

Well, that depends on what the purpose exactly is. It would be equally reasonable to make it truly Hoare and keep them apart. This wouldn't be a problem for indexing loops because widening of addresses delegates to the offsets and does, e.g. interval, widening in there. Just like the index variable itself.

sim642 commented 2 years ago

Having implemented general "Hoare" domains with pairwise and projected (e.g. by hash) joining strategies, it's now apparent that the joinability of addresses isn't a well-defined equivalence/partitioning at all. For example:

  1. a[def_exc:Not {0}([0,7])] and a[def_exc:Not {1}([0,8])] are joinable, because they both have indices which can be joined in the int domain.
  2. a and a[def_exc:Not {1}([0,8])] are joinable, because the former is considered to be analogous to a[0] and Not {1} may equal 0.
  3. a and a[def_exc:Not {0}([0,7])] are not joinable, because Not {0} must not equal 0 (of a[0]).

This logic is implemented here: https://github.com/goblint/analyzer/blob/8d539ec162eb4e20b724d9beac05b70a165aa9bd/src/cdomains/lval.ml#L121-L132

This lack of proper partitioning structure is probably another reason for #803, which reveals that some operations of HoarePO are unnecessarily quadratic over all individual elements rather than linear over partitions. Without guarantees on the equivalence relation, the only somewhat consistent way to do things is to check all pairs.

This actually means that other HoarePO operations which assume proper partitioning, e.g. join that stops iteration after the first match, assuming it's the only possible one, could be broken too. Or at least dependent on representation and order of operations.

michael-schwarz commented 2 years ago

How about we consider values derived from the same varinfo (potentially) comparable, and all others incomparable?

sim642 commented 2 years ago

The case I described can be fixed by joining anything (including a must not zero index) with NoOffset.

In order to make all offsets of a varinfo comparable, the second raise Lattice.Uncomparable would have to be filled in with something else as well. That includes joins/etc. of different fields and with indices to be something meaningful. Whatever those results are, they have to also be compatible with the rest of Goblint, i.e. we must be potentially able to handle indices of structs and whatnot.

The current separation tries to avoid those problems by keeping such offsets separate. It's also a matter of precision since currently the "partitions" of joined elements are smaller than just per varinfo.

sim642 commented 6 months ago

"Generation of Violation Witnesses by Under-Approximating Abstract Interpretation" also contains some improved powerset widening.