Open sim642 opened 4 years ago
From what I understand, this just encodes a should_join-like query through exceptions while making no sense as a lattice (two elements can always be joined).
Yes, that's why it's called PO, but I agree the naming is very confusing since it's more than a partial order. Exceptions seemed the easiest way to implement it. The better alternative was some abstract compare (instead of a should_join
) like hinted at here (but seemed like too much work at the time): https://github.com/goblint/analyzer/blob/08316c5560b1c1298d8b0b29af4b076f12de0eb5/src/domains/lattice.ml#L8
It would remove the need for exception-based control flow hacks while allowing an optimized bucket-based Hoare domain implementation to still be used.
SetDomain.HoarePO
is only efficient because it relies on the hashes of comparable elements to be the same.
I kept SetDomain.Hoare
for PathSenstive2
since this does not hold in general. I haven't compared it with HoarePO
, but would assume it's more expensive than Hoare
if there's no sharing of buckets.
Over time I've edited the issue and collected more domains that duplicate the Hoare ordering and operations to varying extents. I've been thinking hard about what a good (and not any less efficient) generalization of them would be. Below is just a long dump of things that I've realized. It might not be legible to anyone but me though...
AddressSet
Instead of distributing elements into buckets by hash
, we can think of it as some function p: elt -> P
, which assigns an abstract partition to each element. Using hash
and int
for it is just a concrete instance (at least if we ignore hash collisions for the time being, but that's just due to the lack of a less expressive choice of partition identifiers). One can also think of a corresponding should_join: elt -> elt -> bool
operator, which is just induced by the equality of partitions given by p
. Of course that's a less efficient view, but at least provides some frame of reference when comparing to the others.
Elements in the same partition are comparable and the maximal element from the comparable ones remains, as is usual for simplification of Hoare powersets. It's just a bit less obvious because the join
operator on addresses does both the comparability check and the joining, without having to leq
them first to find maximal elements (like the set-based Hoare domain does).
The interesting consequence of this is that there's actually no forceful joining of non-leq
elements, but all the element joins are naturally to keep maximals. Moreover, the partitioning actually serves to keep the elements apart.
PathSensitive2
Here the Hoare set keeps maximal elements, but also joins elements which should_join
. Notably, the latter happens even if they're not in leq
relation. So the should_join
partitioning serves to merge elements together, not keep them apart. Another unique behavior here is that the maximality may join (or really just remove the lesser) elements which aren't related by should_join
.
As mentioned, should_join
doesn't make it possible to do such bucketing for paths. When actually looking at implementations of should_join
, they're all based on D.equal
(or true
in trivial cases). This means that the path sensitivity could instead be defined by a projection function, which projects out the component(s) whose equality is equivalent to the existing should_join
. So looks like it should be possible to define it as the suitable p
to be used for buckets.
But that's actually not true due to the aforementioned property of maximality joins possibly going across these partitions. This is also very clearly possible in the implementation: when the Hoare set does reduce
it doesn't even know about should_join
. So the reduction may cause one partition (bucket) to disappear because it's subsumed by another. Not sure whether this behavior is necessary for something or just accidental. And then the join_reduce
logic which is based on should_join
, doesn't care about maximality again.
I was then tempted to think that the maximality-keeping should also happen on the partition components. In the AddressSet
situation there's no one bucket by nature, which is leq
of another, so nothing would change there. But that's not true either because the non-path-sensitive part of the domain might be in opposite leq
order. So actually I now suspect that such projection-based path-sensitivity could be possible as a Map
, but it needs to be some generalized version of HoareMap
. Then the notion of maximality on entire paths (not just the path-sensitive components) gets less obvious though, since things get split up.
PartitionDomain
These are domains whose elements themselves are partitionings (sets of sets) which are ordered by Hoare as well. But this standard notion of partition isn't related to the above discussion on partitioning elements into buckets for better efficiency. So the behavior of keeping maximal elements isn't what's intuitively happening here.
Yet again, these domains contain a forceful joining feature as collapse
. This one defined truly in a binary operator manner by looking for one common element. Hence here no direct projection analogue is possible. It just shows that the binary notion of merging elements in these domains still exists, regardless of the clever bucket-partitioning scheme.
What confuses me now though is that at one point I did some experimental coding and managed to redefine both AddressSet
and PathSensitive2
via a slightly generalized PartitionDomain
, which supported both collapsing and had a functor for optimization by giving a projection function as well (for address sets). Having looked so deeply at the other above, I'm not sure anymore whether it really worked correctly though or changed any of the subtle behaviors.
SensitiveConf
This domain is more like what I proposed for PathSensitive2
above: it's a set of pairs, where each pair has a path-sensitive component and a non-path-sensitive component. And the values are forcefully joined whenever path-sensitive components equal. I haven't dug into the git history, but I assume this is how path-sensitivity was done before "2".
But this one doesn't consider leq
(or maximality) between path-sensitive components either, just equality. This probably simplifies it's logic compare to some HoareMap
-like domain, but as mentioned above, doesn't exactly match the current PathSensitive2
. This makes me think that the maximality of path-sensitive components is an unintended behavior introduced by switching from an explicit pair to should_join
and maximality. It's not obvious though which of the two behaviors is "right".
Funnily enough, its code refers to my HoareMap
idea already:
The meet operation is slightly different from the above, I think this is the right thing, the intuition is from thinking of this as a MapBot
The implementation just doesn't represent it as a map, but keeps the less efficient set-of-pairs construction.
I got thinking about these things again to try to figure out the right generalization that captures both AddressDomain
and PathSensitive2
. As always happens when I touch this topic, I managed to find weird behavior of both on these examples: https://github.com/goblint/analyzer/commit/768b4fbac03c1488d5879565ab58af5f93fc0c61:
PathSensitive2
's Hoare normalization (reduction) is too aggressive and also acts over paths (joining values where should_join x y = false
).AddressDomain
(with HoarePO
underneath) isn't even a Hoare domain: the leq
on its elements happens on offsets, so &arr[1]
and &arr[2]
are incomparable (neither subsumes the other), so Hoare normalization should keep both as maximal elements, but they are aggressively joined instead.the leq on its elements happens on offsets, so &arr[1] and &arr[2] are incomparable (neither subsumes the other), so Hoare normalization should keep both as maximal elements, but they are aggressively joined instead.
Yes, but this on purpose here!
Yes, but this on purpose here!
Well, that depends on what the purpose exactly is. It would be equally reasonable to make it truly Hoare and keep them apart. This wouldn't be a problem for indexing loops because widening of addresses delegates to the offsets and does, e.g. interval, widening in there. Just like the index variable itself.
Having implemented general "Hoare" domains with pairwise and projected (e.g. by hash
) joining strategies, it's now apparent that the joinability of addresses isn't a well-defined equivalence/partitioning at all. For example:
a[def_exc:Not {0}([0,7])]
and a[def_exc:Not {1}([0,8])]
are joinable, because they both have indices which can be joined in the int domain.a
and a[def_exc:Not {1}([0,8])]
are joinable, because the former is considered to be analogous to a[0]
and Not {1}
may equal 0.a
and a[def_exc:Not {0}([0,7])]
are not joinable, because Not {0}
must not equal 0 (of a[0]
).This logic is implemented here: https://github.com/goblint/analyzer/blob/8d539ec162eb4e20b724d9beac05b70a165aa9bd/src/cdomains/lval.ml#L121-L132
This lack of proper partitioning structure is probably another reason for #803, which reveals that some operations of HoarePO
are unnecessarily quadratic over all individual elements rather than linear over partitions. Without guarantees on the equivalence relation, the only somewhat consistent way to do things is to check all pairs.
This actually means that other HoarePO
operations which assume proper partitioning, e.g. join
that stops iteration after the first match, assuming it's the only possible one, could be broken too. Or at least dependent on representation and order of operations.
How about we consider values derived from the same varinfo
(potentially) comparable, and all others incomparable?
The case I described can be fixed by joining anything (including a must not zero index) with NoOffset
.
In order to make all offsets of a varinfo
comparable, the second raise Lattice.Uncomparable
would have to be filled in with something else as well. That includes joins/etc. of different fields and with indices to be something meaningful. Whatever those results are, they have to also be compatible with the rest of Goblint, i.e. we must be potentially able to handle indices of structs and whatnot.
The current separation tries to avoid those problems by keeping such offsets separate. It's also a matter of precision since currently the "partitions" of joined elements are smaller than just per varinfo
.
"Generation of Violation Witnesses by Under-Approximating Abstract Interpretation" also contains some improved powerset widening.
Links
In my attempt to understand the Hoare powerset domain (
SetDomain.Hoare
) used for path-sensitivity I happened upon the following:Issue
Our current widening is implemented as follows: https://github.com/goblint/analyzer/blob/08316c5560b1c1298d8b0b29af4b076f12de0eb5/src/domains/setDomain.ml#L623-L628 This corresponds exactly to Definition 7 from the latter paper. However, it is only called an "extrapolation heuristics" and not "widening" because it doesn't guarantee the ascending chain condition. So we don't do true widening there.
Possible fix
The same paper also presents three different generic proper widening operators for such a domain using the inner domain's widening. Implementing any of them might not be straightforward though because they all require some kind of additional operator.
Merging in Hoare domains
The paper also discusses merging of elements in a Hoare set. Therefore such notion seems closely related but Goblint has multiple different Hoare powerset domains with quite different characteristics:
PathSensitive2
) domain is defined throughSetDomain.Hoare
with additional joining of elements wrapped around injoin_reduce
throughshould_join
s from innerSpec
s.AddressSet
domain is defined throughSetDomain.HoarePO
but the partitioning of elements into joined sets is deeply encoded intoHoarePO
through questionable means. It assumes the inner domain's operators (e.g.join
) only act on elements which should be joined and raiseLattice.Incomparable
on anything else. From what I understand, this just encodes ashould_join
-like query through exceptions while making no sense as a lattice (two elements can always be joined).PartitionDomain.Set
andPartitionDomain.Make
also use Hoare ordering and seem to use a specialcollapse
function to do joining. This ends up in region analysis domain.PartitionDomain.SetSet
seems to largely duplicate the behavior of the above partition domains although doesn't explicitly contain a merging function (or it is implicit). This ends up in vareq analysis domain.SetDomain.SensitiveConf
is some old and unused path-sensitivity domain that also uses Hoare ordering.PathSensitive2
was directly implemented as a Hoare-like domain before 5f5d8f88b1c8a08c6fb82ed79032d5f409edf415.If the
should_join
-like operator is passed to a hoare powerset domain functor and the merging of elements by equivalence is moved to a single place, a single implementation should do. It would remove the need for exception-based control flow hacks while allowing an optimized bucket-based Hoare domain implementation to still be used.