Gbury / dolmen

Dolmen provides a library and a binary to parse, typecheck, and evaluate languages used in automated deduction
BSD 2-Clause "Simplified" License
80 stars 17 forks source link

Too restrictive well foundness criteria #222

Open bobot opened 1 month ago

bobot commented 1 month ago

From a file generated from an mlw by why3, the fact that list is well-founded is not taken into account:

(error "File "why3-examples/binomial_heap-BinomialHeap-occ_nonnegqtvca0bc09.psmt2", line 136, character 0 to line 137, character 60:
        136 | (declare-datatypes ((tree 0))
        137 |   (((treeqtmk (elem Int)(children (list tree))(rank Int)))))
        Error Not well founded datatype declaration")
Gbury commented 1 month ago

That's actually a known limitation of the SMT-LIB specification of well-foundedness for recursive datatypes, which I'd argue dolmen currently implements faithfully.

Here is the relevant part of the specification:

The datatype δ must be well founded in the following inductive sense: it must have a constructor of rank τ1 · · · τmδ such that τ1 · · · τm does not contain any of the datatypes from {δ1, . . . , δn} or, if it does contain some, they are well founded

This specification is relatively unclear about what happens in polymorphic situations (which is not that much of a surprise considering the recent and partial appearance of polymorphism in SMT-LIB2), where the notion of "does something contain a type" is not precise enough; currently it is interpreted by dolmen as any occurrence of the type, even as parameter of some other polymorphic type (e.g. list). Clearly, this is a point that need to be clarified (all the more for SMT-LIB v3).

One problem here is that to check well-foundedness of such a declaration, one would have to expand the definition of list (because if one replaces list by another type, e.g. type 'a pair = 'a * 'a, then the declaration is not well-founded anymore), and that kind of expansion might be in general costly to perform (and annoying to implement but that's another story). That being said, even with a manual expansion of the list type inside the datatype declaration (which I tested), the declaration is rejected by dolmen, so there definitely can be some improvements in dolmen here, at least so that the typechecker is able to check that this is well-founded (for at least one reasonable definition of well-founded). Separately, the criterion used by the SMT-LIB should be clarified, and i'll create an issue on the SMT-LIb repo to raise the question.

Halbaroth commented 1 month ago

Besides, we got the same issue while working on tests from Décysif project. I opened an issue on Why3 repository: https://gitlab.inria.fr/why3/why3/-/issues/860

bobot commented 1 month ago

With .psmt2, dolmen is already going beyond .smt2. So I propose to accept the same algebraic definitions as Why3 in this format. It is a conservative extension. At least Colibri2 would be able to work with such types, I believe Alt-Ergo too. Otherwise Why3 will have to use encodings for those types.

Gbury commented 1 month ago

Do you have a description of the exact criterion that is used by why3 so that I can try and implement it ?

Gbury commented 1 month ago

@bobot I posted the issue on the SMT-LIB repo. I think that one thing that could help move things a lot would be to have a description of a well-founded criterion that could replace/improve the current one.

bobot commented 1 month ago

I agree. I think the checked done by why3 are in this two places:

bclement-ocp commented 1 month ago

As discussed previously, I believe a (the?) good criterion is to extend the well-founded criterion to also include non-mutually-recursive parametric datatypes, because the well-foundedness of a parametric datatype can in general depend on the well-foundedness of its arguments. I think this is more or less what Why3 does — note the absence of a check for the mutually recursive definition here.

The following "counter-example" must be rejected:

(declare-datatypes ((Either 2)) ((par (A B) (
  (inl (prl A))
  (inr (prr B))
))))

(declare-datatypes ((Bogus 0)) ((
  (mkBogus (bogus (Either Bogus Bogus)))
)))
bobot commented 1 month ago

Indeed such type are refused by trywhy3 .

note the absence of a check for the mutually recursive definition here.

I'm not able to follow this argument, but I agree the checks are similar. I haven't found positivity discussed previously.

bclement-ocp commented 1 month ago

Sorry, I got confused — I was thinking of this part of the SMT-LIB criterion:

The datatype δ must be well founded in the following inductive sense: it must have a constructor of rank τ1 · · · τmδ such that τ1 · · · τm does not contain any of the datatypes from {δ1, . . . , δn} or, if it does contain some, they are well founded

But this is not actually the one that causes the issue here — rather it is this one, which I believe is also the one that deals with positivity: "τj is a sort term that contains no occurrences of δ1, . . . , δn below its top symbol.". The following definition is forbidden by that criterion (interestingly it also makes Z3 segfault):

(declare-datatypes ((list 1) (tree 0)) ((par (alpha) (
  (nil)
  (cons (head alpha) (tail (list alpha)))
))

(
  (treeqtmk (elem Int) (children (list tree)) (rank Int) )
)))
Gbury commented 1 month ago

"τj is a sort term that contains no occurrences of δ1, . . . , δn below its top symbol."

Oh, I missed that part. For what it's worth, this specific criterion is not enforced by dolmen at the moment, and if polymorphism is to be fully supported by the SMT-LIB, this is - in my opinion - an extremely limiting criterion that should be removed.

@bclement-ocp , @bobot do not hesitate to also comment on the issue that I opened on the SMT-LIB repo : https://github.com/SMT-LIB/SMT-LIB-2/issues/32

bclement-ocp commented 1 month ago

Oh, I missed that part.

Then I'm confused. The above example (with mutually recursive list and tree) is rejected by Dolmen, and if it's not by that criterion, I don't understand why it is rejected.

Gbury commented 1 month ago

Then I'm confused. The above example (with mutually recursive list and tree) is rejected by Dolmen, and if it's not by that criterion, I don't understand why it is rejected.

Because of the criterion I quoted at the beginning of the thread:

The datatype δ must be well founded in the following inductive sense: it must have a constructor of rank τ1 · · · τmδ such that τ1 · · · τm does not contain any of the datatypes from {δ1, . . . , δn} or, if it does contain some, they are well founded

Since tree only has a single constructor, the sort/type/rank of that constructor must respect the criterion above, i.e. that if it does contain some recursive datatypes, then they are well-formed. In the constructor of tree, there are two occurrences of datatypes: one occurrence of list, and one occurrence of tree. The occurrence of list is okay since it is well-formed (the nil constructor respects the criterion), however, the occurrence of tree does not respect that criterion, since we have not yet determined that tree is well-formed (we are checking its only constructor precisely to check whether it is well-formed, and once list is determined to be well-formed, the only thing left to do is check that tree is well-formed).

As I mentioned in the issue on the SMT-LIb repo, this may be due to how one interprets the notion of "contain" in the criterion. Currently, dolmen interprets "contain" as any occurrence (even under type constructors).

bclement-ocp commented 1 month ago

Ah, yes. list is not the issue, tree is — got it. Thanks for the explanation.