LeventErkok / sbv

SMT Based Verification in Haskell. Express properties about Haskell programs and automatically prove them using SMT solvers.
https://github.com/LeventErkok/sbv
Other
241 stars 33 forks source link

datatype declarations #343

Closed joelburget closed 6 years ago

joelburget commented 6 years ago

I'd love to be able to declare datatypes (z3 syntax, though datatypes are in the smt-lib 2.6 standard):

(declare-datatypes (T1 T2) ((Pair (mk-pair (first T1) (second T2)))))
(declare-datatypes () ((S A B C)))
(declare-datatypes (T) ((Lst nil (cons (hd T) (tl Lst)))))

As a strawman I think it would be possible to define suitable operations with generics:

class Datatype a where
  lift :: a -> SBV a

  default lift :: (Generic a, GDatatype (Rep a)) => a -> SBV a
  lift = gLift . from

class GDatatype f where
  gLift :: f a -> SBV a

instance Gdatatype V1 where ... (U1, M1, K1, :*:, :+:)

I guess we also have to lift operations on datatypes...

But I haven't thought too hard about this so maybe it's harder than I expect or impossible. Either way I'd like to know.

cc @bts

LeventErkok commented 6 years ago

The only support for data-types currently is for:

So, for instance, you can do your second example with S, but not the other two.

There are several issues:

In principle, I do agree that it would be nice to have direct datatype support, but there are many issues around its use and practicality. If you do make an attempt at implementing them, please do send a patch!

joelburget commented 6 years ago

Thanks for the very informative answer! I have a use case in mind which isn't subject to any of your issues, as far as I can tell. I'd like to use a Value type, something like:

data Value
  = BV Bool
  | IV Integer
  | Unit

What would I like to do with this? I'm defining an interpreter which does one thing given a bool and another given an integer. It's possible to encode this type as (Tag, Integer) but it's more natural as a sum type. What would you do in this case?

LeventErkok commented 6 years ago

This is quite typical actually; and I think the right solution is to go with "constructor-concrete, value-symbolic" style. Something like:

      data SValue = BVS SBool | IVS SInteger | UnitS

Assuming your input is well-typed, you should never be mixing these anyhow in any execution path. (That is, any named expression will always have one of these types.) So, you can always decide the constructor concretely, and keep the value part symbolic.

You can also play tricks around:

      data Value btype itype = BV btype | IV itype | Unit

and then:

      type SValue = Value SBool SInteger
      type CValue = Value Bool Integer

etc., to keep the noise down.

I think this is a much better approach to symbolic simulation than tagged symbolic values.

joelburget commented 6 years ago

Assuming your input is well-typed

Right, I think this is the crux of my problem! The language I'm evaluating looks like

data Ast
  = Value Value
  | Throw String
  | IfThenElse Ast Ast Ast
  | Enforce Ast String
  | Sequence [Ast]
  | Arith ArithOp [Ast]

  | Let SInteger Ast Ast
  | Var SInteger

  | Read Ast
  | Write Ast Ast
  deriving (Show, Eq)

The interesting bit is Read / Write, both of which take an address. We'd think of evaluation throwing if the address is not an integer. This is the part I'm struggling with, even using the trick you suggested.

As I write this out it occurs to me that I could index Ast by the type it evaluates to:

data Ast ret where
  BValue     :: SBool                      -> Ast Bool
  IValue     :: SInteger                   -> Ast Integer
  UValue     ::                               Ast ()
  Throw      :: String                     -> Ast ()
  IfThenElse :: Ast Bool -> Ast a -> Ast a -> Ast a
  ...

I'm not sure well this works in practice but I'll give it a try.

LeventErkok commented 6 years ago

The typical approach would be to have two layers: An untyped Ast that represents programs as written by the users. And a typed variant, possibly also quite simplified, that would be guaranteed to be type-correct. (Think of it, for instance, like GHCs core.) You wouldn't want to do symbolic simulation on the first AST: It's too big and possibly untyped. You'd do the symbolic work on some intermediate-representation that is definitely after type-checking and possibly after simplifications/optimizations.

Whichever approach you take though, I'd like to know how it goes. One of my goals is to make SBV easy to use, and I'd like to know if there are any improvements we can do to improve things.

LeventErkok commented 6 years ago

@joelburget Can we close this ticket? What did your experiments show?

joelburget commented 6 years ago

Taking your advice, I decided I didn't need datatype support this time (though I'm curious when you consider smt-lib datatypes useful). Thanks for your help :)

LeventErkok commented 6 years ago

For certain modeling problems, they can indeed be useful. And I did use enumerations before, which is fully supported by SBV. (Here's an example: https://hackage.haskell.org/package/sbv-7.5/docs/src/Data.SBV.Examples.Queries.Enums.html and here's another: https://hackage.haskell.org/package/sbv-7.5/docs/Data-SBV-Examples-Queries-FourFours.html)

The issue is that if the data-type is not recursive (as in your case), then there's probably a simpler/better way to model it in SBV, since in most symbolic programs the spine remains concrete. (One can, of course, think of problems where the constructors can benefit from being symbolic, but it is rather rare.) I think this is precisely what you found as well.

Where they would be really useful is if they can be recursive, the de-facto example being lists of arbitrary (but finite) length. Unfortunately, that doesn't really work within the SBV eco-system since we lose pattern-matching, and most interesting proofs would require induction that SMT solvers aren't really very good at. At least not yet.

I'm still thinking it would be a nice addition if someone added support for them, possibly with some template-Haskell like magic to recover some notion of pattern-matching. But it's hard to get motivated and implement that without a motivating problem. (It's not a trivial amount of work by any means.)

acertain commented 5 years ago

Is there a way to declare datatypes manually?

LeventErkok commented 5 years ago

@fread2281

Haskell's List, Maybe, and Either data types are supported out-of-the-box. SBV also support Sets, which roughly correspond to Haskell's Data.Set but not exactly. User-defined enumerations are also supported; i.e. data-types with no arguments. There are many examples in the release for each of these.

There's no way to declare datatypes manually unless you want to spit that definition out as a string; which wouldn't really be useful. But to really answer your question, I'd like to know what you are trying to do exactly. When this issue came up in the past, there was always a "better" way to model the problem. (Typically, you just use your Haskell data-type, but replace the "leaves" with symbolic elements. I.e., the spine remains "Concrete" but data-values become symbolic.) This is typically a better way to go as SMT solvers do not usually have decision procedures for arbitrary datatypes, especially if they are recursive.

If you describe what you're trying to do in some detail, I can make a better recommendation as to how to proceed.

acertain commented 5 years ago

I'm implementing a constraint logic programming(/CHC) solver/language. Essentially, I have a term type data Tm = TmTree String [Tm] | TmInt Integer | ..., and programs are (possibly recursive) relations over Tm.

Solving consists of maintaining approximations of relations, and running a series of SMT queries using different approximations in place of recursive calls, combined with iterated deepening (there is a family of algorithms doing this, e.g. SPACER). Since this is logic programming, programs use or and equality in place of pattern matching.

It might be possible to ground datatypes out of queries before calling the solver, but that's unnecessary complexity and I think would probably hurt performance.

I don't mind writing the Tm definition as a string, as long as I can use the rest of sbv for e.g. maintaining SMT variables (of type Tm).

LeventErkok commented 5 years ago

There's no direct way to turn your 'Tm' into an SMT-Lib datatype "magically" for the time being. You can, however, "serialize" it in a precise sense and use a product type for each instance. This might look cumbersome, but it works well in practice. Note that you cannot use SMTLib's match statement, or alike; so you'll have to be more explicitly coding anything that case-analyses a Tm. If symbolic values of Tm are important to you (i.e., values can "change" constructor along runs) then you're probably better off using some other modeling tool; I'd recommend Lean for that sort of thing.

A simple example of modeling naturals is explained here: https://stackoverflow.com/questions/53444430/encoding-extended-naturals-in-sbv

Of course, depending on what you want to do, that solution may not be sufficient; and having "junk" in your models is always problematic.

Long story short, unless you can find a product type that matches your Tm for each concrete instance, SBV is probably not the best tool for you.

acertain commented 5 years ago

Currently, I'm using simple-smt / generating s-expressions by hand. I'm fine doing that some, but it'd be nice if I could use sbv to manage variable names, get a little bit of type safety, and possibly more sharing.

Lean is not useful for this, as the solver needs to be completely automatic. It's probably possible to analyze queries to determine max depth of terms that they use for each variable and then use bounded versions of Tm, but, I don't expect that to be a good idea.

LeventErkok commented 5 years ago

As with anything, it's impossible to say without trying things out. If you do use SBV and find out something fishy (bug or performance) do let me know! Good luck.

LeventErkok commented 5 years ago

@fread2281

If your datatype is not recursive, and you're willing to do a whole bunch of boilerplate programming (due to lack of any way to do proper pattern matching on symbolic values), you can code most every datatype symbolically. I've coded up a not entirely trivial (but admittedly silly) example here: https://gist.github.com/LeventErkok/20f1eeb1505b0ff6e8569b9108405a57

I honestly don't think this scales, unless you figure out a way to do some way of pattern matching. Template-Haskell can actually sort of do that, but I'm not a big proponent of TH, and given the state of affairs regarding support for data-types in SMT solvers and the heavy machinery involved, whether there's any ROI in this approach truly depends on what your alternatives are and how well suited your problem domain is. (The non-recursive requirement might already be a deal-breaker.)

Nonetheless, if you squint properly, the resulting program sort of looks like good old lisp-style programming but with symbolic values. Let me know if you pursue this any further and run into any issues. (Your term type is already recursive, so perhaps this is a non-starter, but it's fun to play around with this nonetheless.)

chrisdone commented 4 years ago

I'm chewing over the idea of using sbv for a type of spreadsheet software I'm writing.

  1. It would be handy to solve linear equations for users that are thinking about every day vanilla arithmetic or financial calculations. For this case, it'd be very natural to plug parts of the user's AST into SBV and suggests solutions. Something as trivial as "I paid 27$ at the restaurant with a 10% tip. What was the original bill? So: _n * 1.1 = 27 -- fill in the hole please"
  2. I'd like to check to see whether an equation may or may not divide by zero at some point, where a check of /=0 would be the precondition. Like LiquidHaskell does. I'm envisioning "we scanned your code and weren't able to prove it won't throw a divide by zero error" type of informational warning. False negatives are OK.

So I'm interested to hear about any results in this area by anyone experimenting with it.

LeventErkok commented 4 years ago

@chrisdone

Item 1 is bread-and-butter for SMT solving, and shouldn't be a problem at all.

Item 2 is what sAssert and safe calls are for. See here: https://hackage.haskell.org/package/sbv-8.8/docs/Documentation-SBV-Examples-Misc-NoDiv0.html

Let me know if you use these facilities and run into any issues.

chrisdone commented 4 years ago

@LeventErkok Ohh, thank you very much! That assertion example is very interesting and instructive. :clap::smiley:

LeventErkok commented 4 years ago

@chrisdone You'll probably be interested in the CheckedArithmetic class as well: https://hackage.haskell.org/package/sbv-8.8/docs/Data-SBV-Tools-Overflow.html#t:CheckedArithmetic

It provides operations that ensure they cannot overflow/underflow for regular machine arithmetic.

chrisdone commented 4 years ago

Thanks! My language has infinite precision integers, but if I add fixednums that'll come in handy.