Unlawful `compare` implementations

sim642 commented 9 months ago

The compare functions which are exposed to OCaml, either directly or via custom_operations structs and Stdlib.compare, do not satisfy the usual order laws. In turn, this means that many Apron types cannot (at least easily) be used for Set.Make and Map.Make. Since the operations are somehow defined, doing so doesn't immediately fail (e.g. by exception), but leads to some nasty Heisenbugs down the line.

Examples

`Environment`

Environment.compare is explicitly exposed to OCaml: https://github.com/antoinemine/apron/blob/bdf251f25cc08bf899fd8679a02ffc4a5e4e570a/mlapronidl/environment.idl#L216-L218 The documentation correctly describes its behavior, which, however, is not the compare suitable for Set.OrderedType. For example (in pseudo-syntax), Environment.compare {var1} {var2} > 0 and Environment.compare {var2} {var1} > 0 both return true (because compare returns 2 both ways).

What's worse, the same implementation is specified for the Environment.t custom OCaml block via custom_operations struct, which is to be used by Stdlib.compare (and its standard operator-derivatives). The problem is that this doesn't fit the usually-understood properties of that function as documented:

compare x y returns 0 if x is equal to y, a negative integer if x is less than y, and a positive integer if x is greater than y. [...] The compare function can be used as the comparison function required by the Set.Make and Map.Make functors.

So, for example, {var1} > {var2} and {var2} > {var1} both return true as well.

`Abstract0` (and `Abstract1`)

Abstract0 (and Abstract1, which is built on top of it) don't directly expose and document a compare. However, one is still defined for Abstract0.t's custom block: https://github.com/antoinemine/apron/blob/bdf251f25cc08bf899fd8679a02ffc4a5e4e570a/mlapronidl/apron_caml.c#L282-L308 At first glance, this appears somewhat sound: first dimensions are compared, then ap_abstract0_is_eq is used to check for semantic equality (which is also exposed as is_eq to OCaml). But the final fallback comparison is of pointers themselves (which are arbitrary memory addresses). On its own, that's a valid total order, but not when combined with the semantic equality check.

For example, suppose a1 = {x>=0}, a2 = {-x>=0} and a3 = {x>=0} are allocated in order and their addresses also happen to be in the same order. Then a1 < a2 and a2 < a3 per pointer comparison, but a1 = a3 per semantic equality check.

Other types

Other Apron types might have similar issues, I didn't check everything. Var.compare at least is sound because it's just strcmp.

Conclusion

Even though polymorphic comparison and Stdlib.compare aren't generally recommended, they have been defined for Apron's custom blocks and that's the only way to compare most of them. Even if there's no plan to fix it, this issue hopefully serves as a warning to anyone trying to use Set.Make/Map.Make on Apron data types, which appears to work at first glance but can surprisingly break down.

sim642 commented 9 months ago

Apparently Frama-C realized this at some point as well:

      (* BIGTODO: this function is not quite a total order, because [is_leq] is
         only a partial order. Using the hash as a first comparison is only
         a doubtful hack. *)
      let compare a b =
        if equal a b then 0
        else
          let cmp = compare (hash a) (hash b) in
          if cmp <> 0 then cmp
          else if Abstract1.is_leq man a b then 1 else -1

(https://git.frama-c.com/pub/frama-c/-/blob/c13260097f092e405ccb1131668fbc4c94eb1644/src/plugins/eva/domains/apron/apron_domain.ml#L390-398).

antoinemine commented 5 months ago

Thanks for reporting this issue. I try to address them in #108, by basically disabling the problematic features. I currently see no efficient and robust solution to implement a total order on these data-structures and make them usable as keys in sets and maps. In particular, the order cannot rely on the internal representation of an abstract element as the library is free to update it with another representation of the same concrete set without notice.

The Frama-C code shows that they did not find a good solution either.

sim642 commented 5 months ago

For Environment at least a total order should be possible: they seem to be normalized such that variables (which are totally ordered) are sorted in the underlying array. Environment.equal also compares them under that assumption, without any semantic considerations like Environment.cmp now. However, also implementing Environment.compare with the lexicographic total order would be an even more confusing breaking change: existing uses of it (which likely depend on the semantic results it previously had) would start behaving unexpectedly.

antoinemine commented 5 months ago

Yes Environment are represented as ordered lists of variables. This is important to ensure that, for environments with the same variables, a given variable is always assigned the same index (vector space dimension). What complicates the matter is that we distinguish integers and float variables. Environments where the a variable is integer in one and float in the other are not compatible.

In theory, a total order could be possible for Environment but also possibly other data-types. Ideally, they should be extensions of the natural partial order, so that ⊆ ⇒ ≤. However, they are probably complex to implement and costly to run, in addition to also breaking compatibility with the incorrect implementation of compare we wish to abandon. I fear that people would continue to use ≤ assuming it means (or implies) ⊆. I suggest we remove compare and not replace it at all (at least, not yet). The drawback is that we can longer use the polymorphic = to test for equality.

antoinemine commented 5 months ago

Fixed in #108

antoinemine / apron