Closed joelburget closed 6 years ago
The only support for data-types currently is for:
So, for instance, you can do your second example with S
, but not the other two.
There are several issues:
Pattern-matching: Haskell has a hard-coded interpretation of how pattern-matching works, one that does not go well with symbolic types. Loss of pattern-matching makes programming with such data-types pretty tricky.
Recursion: SMTLib supports recursive data-types, but to make any good use of it you really end up needing quantified axioms, at which point the logics become semi-decidable. So, proofs become rare.
For "component" types (like your pairs example), it's much better to use a pair of symbolic values, as opposed to a symbolic pair itself. Much better support all around.
In principle, I do agree that it would be nice to have direct datatype support, but there are many issues around its use and practicality. If you do make an attempt at implementing them, please do send a patch!
Thanks for the very informative answer! I have a use case in mind which isn't subject to any of your issues, as far as I can tell. I'd like to use a Value
type, something like:
data Value
= BV Bool
| IV Integer
| Unit
What would I like to do with this? I'm defining an interpreter which does one thing given a bool and another given an integer. It's possible to encode this type as (Tag, Integer)
but it's more natural as a sum type. What would you do in this case?
This is quite typical actually; and I think the right solution is to go with "constructor-concrete, value-symbolic" style. Something like:
data SValue = BVS SBool | IVS SInteger | UnitS
Assuming your input is well-typed, you should never be mixing these anyhow in any execution path. (That is, any named expression will always have one of these types.) So, you can always decide the constructor concretely, and keep the value part symbolic.
You can also play tricks around:
data Value btype itype = BV btype | IV itype | Unit
and then:
type SValue = Value SBool SInteger
type CValue = Value Bool Integer
etc., to keep the noise down.
I think this is a much better approach to symbolic simulation than tagged symbolic values.
Assuming your input is well-typed
Right, I think this is the crux of my problem! The language I'm evaluating looks like
data Ast
= Value Value
| Throw String
| IfThenElse Ast Ast Ast
| Enforce Ast String
| Sequence [Ast]
| Arith ArithOp [Ast]
| Let SInteger Ast Ast
| Var SInteger
| Read Ast
| Write Ast Ast
deriving (Show, Eq)
The interesting bit is Read
/ Write
, both of which take an address. We'd think of evaluation throwing if the address is not an integer. This is the part I'm struggling with, even using the trick you suggested.
As I write this out it occurs to me that I could index Ast
by the type it evaluates to:
data Ast ret where
BValue :: SBool -> Ast Bool
IValue :: SInteger -> Ast Integer
UValue :: Ast ()
Throw :: String -> Ast ()
IfThenElse :: Ast Bool -> Ast a -> Ast a -> Ast a
...
I'm not sure well this works in practice but I'll give it a try.
The typical approach would be to have two layers: An untyped Ast
that represents programs as written by the users. And a typed variant, possibly also quite simplified, that would be guaranteed to be type-correct. (Think of it, for instance, like GHCs core.) You wouldn't want to do symbolic simulation on the first AST: It's too big and possibly untyped. You'd do the symbolic work on some intermediate-representation that is definitely after type-checking and possibly after simplifications/optimizations.
Whichever approach you take though, I'd like to know how it goes. One of my goals is to make SBV easy to use, and I'd like to know if there are any improvements we can do to improve things.
@joelburget Can we close this ticket? What did your experiments show?
Taking your advice, I decided I didn't need datatype support this time (though I'm curious when you consider smt-lib datatypes useful). Thanks for your help :)
For certain modeling problems, they can indeed be useful. And I did use enumerations before, which is fully supported by SBV. (Here's an example: https://hackage.haskell.org/package/sbv-7.5/docs/src/Data.SBV.Examples.Queries.Enums.html and here's another: https://hackage.haskell.org/package/sbv-7.5/docs/Data-SBV-Examples-Queries-FourFours.html)
The issue is that if the data-type is not recursive (as in your case), then there's probably a simpler/better way to model it in SBV, since in most symbolic programs the spine remains concrete. (One can, of course, think of problems where the constructors can benefit from being symbolic, but it is rather rare.) I think this is precisely what you found as well.
Where they would be really useful is if they can be recursive, the de-facto example being lists of arbitrary (but finite) length. Unfortunately, that doesn't really work within the SBV eco-system since we lose pattern-matching, and most interesting proofs would require induction that SMT solvers aren't really very good at. At least not yet.
I'm still thinking it would be a nice addition if someone added support for them, possibly with some template-Haskell like magic to recover some notion of pattern-matching. But it's hard to get motivated and implement that without a motivating problem. (It's not a trivial amount of work by any means.)
Is there a way to declare datatypes manually?
@fread2281
Haskell's List
, Maybe
, and Either
data types are supported out-of-the-box. SBV also support Set
s, which roughly correspond to Haskell's Data.Set
but not exactly. User-defined enumerations are also supported; i.e. data-types with no arguments. There are many examples in the release for each of these.
There's no way to declare datatypes manually unless you want to spit that definition out as a string; which wouldn't really be useful. But to really answer your question, I'd like to know what you are trying to do exactly. When this issue came up in the past, there was always a "better" way to model the problem. (Typically, you just use your Haskell data-type, but replace the "leaves" with symbolic elements. I.e., the spine remains "Concrete" but data-values become symbolic.) This is typically a better way to go as SMT solvers do not usually have decision procedures for arbitrary datatypes, especially if they are recursive.
If you describe what you're trying to do in some detail, I can make a better recommendation as to how to proceed.
I'm implementing a constraint logic programming(/CHC) solver/language. Essentially, I have a term type data Tm = TmTree String [Tm] | TmInt Integer | ...
, and programs are (possibly recursive) relations over Tm.
Solving consists of maintaining approximations of relations, and running a series of SMT queries using different approximations in place of recursive calls, combined with iterated deepening (there is a family of algorithms doing this, e.g. SPACER). Since this is logic programming, programs use or and equality in place of pattern matching.
It might be possible to ground datatypes out of queries before calling the solver, but that's unnecessary complexity and I think would probably hurt performance.
I don't mind writing the Tm definition as a string, as long as I can use the rest of sbv for e.g. maintaining SMT variables (of type Tm).
There's no direct way to turn your 'Tm' into an SMT-Lib datatype "magically" for the time being. You can, however, "serialize" it in a precise sense and use a product type for each instance. This might look cumbersome, but it works well in practice. Note that you cannot use SMTLib's match
statement, or alike; so you'll have to be more explicitly coding anything that case-analyses a Tm
. If symbolic values of Tm
are important to you (i.e., values can "change" constructor along runs) then you're probably better off using some other modeling tool; I'd recommend Lean for that sort of thing.
A simple example of modeling naturals is explained here: https://stackoverflow.com/questions/53444430/encoding-extended-naturals-in-sbv
Of course, depending on what you want to do, that solution may not be sufficient; and having "junk" in your models is always problematic.
Long story short, unless you can find a product type that matches your Tm
for each concrete instance, SBV is probably not the best tool for you.
Currently, I'm using simple-smt / generating s-expressions by hand. I'm fine doing that some, but it'd be nice if I could use sbv to manage variable names, get a little bit of type safety, and possibly more sharing.
Lean is not useful for this, as the solver needs to be completely automatic. It's probably possible to analyze queries to determine max depth of terms that they use for each variable and then use bounded versions of Tm, but, I don't expect that to be a good idea.
As with anything, it's impossible to say without trying things out. If you do use SBV and find out something fishy (bug or performance) do let me know! Good luck.
@fread2281
If your datatype is not recursive, and you're willing to do a whole bunch of boilerplate programming (due to lack of any way to do proper pattern matching on symbolic values), you can code most every datatype symbolically. I've coded up a not entirely trivial (but admittedly silly) example here: https://gist.github.com/LeventErkok/20f1eeb1505b0ff6e8569b9108405a57
I honestly don't think this scales, unless you figure out a way to do some way of pattern matching. Template-Haskell can actually sort of do that, but I'm not a big proponent of TH, and given the state of affairs regarding support for data-types in SMT solvers and the heavy machinery involved, whether there's any ROI in this approach truly depends on what your alternatives are and how well suited your problem domain is. (The non-recursive requirement might already be a deal-breaker.)
Nonetheless, if you squint properly, the resulting program sort of looks like good old lisp-style programming but with symbolic values. Let me know if you pursue this any further and run into any issues. (Your term type is already recursive, so perhaps this is a non-starter, but it's fun to play around with this nonetheless.)
I'm chewing over the idea of using sbv for a type of spreadsheet software I'm writing.
So I'm interested to hear about any results in this area by anyone experimenting with it.
@chrisdone
Item 1 is bread-and-butter for SMT solving, and shouldn't be a problem at all.
Item 2 is what sAssert
and safe
calls are for. See here: https://hackage.haskell.org/package/sbv-8.8/docs/Documentation-SBV-Examples-Misc-NoDiv0.html
Let me know if you use these facilities and run into any issues.
@LeventErkok Ohh, thank you very much! That assertion example is very interesting and instructive. :clap::smiley:
@chrisdone You'll probably be interested in the CheckedArithmetic
class as well: https://hackage.haskell.org/package/sbv-8.8/docs/Data-SBV-Tools-Overflow.html#t:CheckedArithmetic
It provides operations that ensure they cannot overflow/underflow for regular machine arithmetic.
Thanks! My language has infinite precision integers, but if I add fixednums that'll come in handy.
I'd love to be able to declare datatypes (z3 syntax, though datatypes are in the smt-lib 2.6 standard):
As a strawman I think it would be possible to define suitable operations with generics:
I guess we also have to lift operations on datatypes...
But I haven't thought too hard about this so maybe it's harder than I expect or impossible. Either way I'd like to know.
cc @bts