SMT-LIB / SMT-LIB-2

Public reference documents for the SMT-LIB standard
12 stars 0 forks source link

Inclusion/Lattice of Logics #27

Open Gbury opened 2 months ago

Gbury commented 2 months ago

In a nutshell: there are SMT-LIB problems that can be typechecked in a given logic, but may not be typechecked in the ALL logic.

Context and problematic example

More precisely, the webpage for logic in the SMT-LIB website (cf https://smt-lib.org/logics.shtml ), presents a graph of logics "ordered by inclusion". What is meant exactly by "inclusion" is a bit unclear though, but I'd naively expect that if a logic A is included in another logic B, then any valid problem for A is also a valid problem for B. However, there are (at least) two logics that are linked in the graph on the webpage, thus suggesting there should be some inclusion, but that do not respect the 'naive' criterion I wrote just above.

Consider the following example:

(set-logic UFLRA)
(declare-fun p (Real) Bool)
(assert (p 42))
(check-sat)
(exit)

This is a valid (as in, typecheckable and conformant to the SMT-LIB specification) file for the given UFLRA logic. However, if you change the logic to UFLIRA (or the AUFLIRA logic, which the webpage suggests should include the UFLRA logic), then it is not valid/compliant anymore. Similarly this file would not be valid/compliant in the ALL logic, assuming that the solver/tool supports/includes LIRA/NIRA in ALL (and that is the current behavior implemented in dolmen).

That is a problem in practice, as it did make the model validation part of the SMT-COMP harder (cc @bobot ) : indeed due to that fact, even if a file is valid/compliant, it may not be valid/compliant if you force the logic to be ALL (which was sometimes necessary to use during SMT-COMP because some problems of the SMT-LIB library are misclassified and do not respect the restrictions of their declared logic). In any case, i'd argue that it is a very surprising behavior for users that going from a logic to a "larger" logic may break the typechecking of some problems.

Explanation and solutions

The root problem of these behaviors is the following: in real-only arithmetic, integer literals, such as 42 are interpreted as real constants, however, with both integer and real arithmetic, integer literals are interpreted as integer constants, which means that the type of 42 effectively changes from Real to Int when going from a real-only logic to a integer-and-real logic.

One may object that the arithmetic theories (and therefore also the corresponding logics) have a mechanism to automatically cast an integer expression into a real one when needed by introducing a to_real application, however, that is only specified to apply for arithmetic operators (i.e. +, -, ...), and therefore does not apply to a user-defined function as in the example above.

I would strongly object to making that syntactic sugar apply to all function symbols, because it means that the Reals theory would effectively change the typing rules of other theories and of the general first-order application globally. Even if that was restricted to literals, that still means that type-checking would not be "local" anymore, and instead become dependent on the context (and not just the names in scope): while that may work currently (and even then, that is a "may"), in a context with polymorphism and type inference, this would become extremely complex to implement and reason about.

One solution would be to never interpret integer literals as real constants, but that would likely break some problems. That being said, the export/printing functionality of dolmen makes sure to generate problems that always use the non-ambiguous literal form (i.e. it inserts .0 to make sure a real literal will always be parsed as a real and not an integer).

Conclusion

I haven't yet found other (better) solutions to this problem, but I'm interested in feedback from the maintainers on the severity of this issue: should logics have a notion of inclusion (as I understand it) ? if so should be try and fix the spec ?

fontainep commented 1 month ago

Hi Guillaume, the issue you raise with integers as reals is reminiscent of SMT-LIB 1. In SMT-LIB 3, using an Int constant where a Real is expected will not be allowed anymore. This is part of a larger effort to clean and make more flexible the design of what we call logics in SMT-LIB < 3.0. This is a change we would rather not backport to 2.7. Does this address your concern?

Gbury commented 1 month ago

Does this address your concern?

It depends on what exactly you mean by "using an Int constant where a Real is expected will not be allowed anymore": there are currently two different situations that I think may fit that description/formulation, and depending on which of these two is not allowed anymore (maybe both ?) the problem may persist or not.

The two situations are as follows: 1) when considering arguments of an arithmetic operation, e.g. the arguments of + or *, in the RealsInts theory, it is accepted for the arguments' types to be an arbitrary mix of either Int or Real, and if there is at least one argument of type Real, then the expression is considered to be some kind of syntaxic sugar, and the arguments of type Int will be wrapped by a call to to_real, so that after syntaxic sugar expansion, all arguments are of type Real. For example, assuming i: Int and r: Real, then (+ i r) is actually syntaxic sugar for (+ (to_real i) r). (Note that this is not the same thing as "overloading" the + operator) 2) a syntaxicaly integer coefficient/literal, such as 1 or 42, is interpreted as a literal of type Int in the Ints and RealsInts theory, but is interpreted as a literal of type Real in the Reals theory. For instance, consider the expression (+ 1 2.5); in the Reals theory, it is to be understood as (+ (as 1 Real) (as 2.5 Real)), whereas in the RealsInts theory, it is to be understood as (+ (to_real (as 1 Int)) (as 2.5 Real)). In this specific example, this is not a problem because both desugared expression are semantically equivalent, but for regular function symbols (i.e. not arithmetic operators), there is not automatic wrapping using a to_real call, so things break down as described in my original post.

In practice, this is point 2 that is problematic I think (at least for this issue). That being said, point 2 does not allow to "use an Int constant where a Real is expected" (that formulation is closer to the point 1 above), but rather it is that "an Int constant" is actually either an Int or a Real depending on the theory that is being used (but independently of what "is expected"). Thus why I'm not exactly sure which of the above two points will not be allowed anymore in SMT-LIB 3.

So the question, is whether it is 1, 2 (or both, or another solution ?) that are not to be allowed anymore in SMT-LIB 3 ?

fontainep commented 1 month ago

In SMT-LIB 3, + is overloaded with types (-> Real Real Real) and (-> Int Int Int). (+ 1 2.5) is not well-typed, but (+1 2) and (1. 2.5) are. There is no silent conversion from an Int constant to a Real.

Gbury commented 1 month ago

So, if I understand correctly, this means that both points 1 and 2 from my above post will no longer be allowed in SMT-LIB 3. In that case, this issue will indeed disappear in SMT-LIB 3.

Side note: you say that (+ 1. 2.5) is well-typed in SMT-LIB 3, and I'm wondering if that means that the lexical conventions for decimal numbers changes ? In SMT-LIB 2 1. is not a valid decimal number and must instead be written 1.0 (1. is actually lexed and parsed as two identifiers: 1 followed by the identifier .).

fontainep commented 1 month ago

So, if I understand correctly, this means that both points 1 and 2 from my above post will no longer be allowed in SMT-LIB 3. In that case, this issue will indeed disappear in SMT-LIB 3.

Indeed

Side note: you say that (+ 1. 2.5) is well-typed in SMT-LIB 3, and I'm wondering if that means that the lexical conventions for decimal numbers changes ? In SMT-LIB 2 1. is not a valid decimal number and must instead be written 1.0 (1. is actually lexed and parsed as two identifiers: 1 followed by the identifier .).

Sorry, I was not rigorous. Only lexically valid decimal numbers will be allowed, so "1.0", not "1."

Gbury commented 1 month ago

Thanks for all the clarifications ! I guess the specific issue I raised will be resolved in SMT-LIB 3.

That being said, about the most general question I was also wondering: is the inclusion of logics (e.g. following the graph of the SMT-LIB website, and the inclusion of any logic in the ALL logic) a property that the maintainers want to enforce in the standard going forward ?

fontainep commented 1 month ago

The very concept of logic somehow disappears in SMT-LIB3. A new concept of module stands for both theories and logics. It is easy to simulate an old logic, with a module that includes modules standing for theories in the combination. Artificial limitations or strange particularities of logic are more complicated to exactly reproduce in SMT-LIB 3, so this will have a "cleaning" side effect. And this gives a more flexible framework. It is not clear to me yet how categorization of benchmarks, etc... will evolve within this new context.