LeventErkok / sbv

SMT Based Verification in Haskell. Express properties about Haskell programs and automatically prove them using SMT solvers.
https://github.com/LeventErkok/sbv
Other
240 stars 33 forks source link

Data.SBV.Core.Symbolic.compare bottleneck #529

Closed arrowd closed 4 years ago

arrowd commented 4 years ago

I have the following code:

    forM_ allVarsWithLocs $ \(x, xLoc) ->
        forM_ allVarsWithLocs $ \(y, yLoc) ->
            constrain $ (xLoc .== yLoc) .=> (x .== y)

Sometimes allVarsWithLocs gets pretty large (3420 elements, which means 6840 vars) and the process spends all the time in adding these constraints. Profiling the program yields:

COST CENTRE           MODULE                   SRC                                             %time %alloc
compare               Data.SBV.Core.Symbolic   Data/SBV/Core/Symbolic.hs:513:28-30              21.8    0.0
modifyState           Data.SBV.Core.Symbolic   Data/SBV/Core/Symbolic.hs:(992,1)-(997,41)       15.4   27.9
compare               Data.SBV.Core.Symbolic   Data/SBV/Core/Symbolic.hs:118:3-45               13.7   36.2
mkSymOpSC             Data.SBV.Core.Operations Data/SBV/Core/Operations.hs:1352:1-94            10.1    2.3
uncacheGen            Data.SBV.Core.Symbolic   Data/SBV/Core/Symbolic.hs:(1636,1)-(1645,32)      8.8   19.4
compare               Data.SBV.Core.Symbolic   Data/SBV/Core/Symbolic.hs:107:43-45               5.1    0.0

Is it possible to optimize something on SBV side, or I'm hitting the ceiling here and should search for ways to shrink the problem?

LeventErkok commented 4 years ago

SBV uses aggressive structural sharing to avoid blow-up in generated programs. In this case, you are generating so many expressions of a very-similar form, thus causing the sharing-detector to spend a lot of time figuring out the expressions are different.

Before investigating any further, see if you can form the conjunction of the constraints and just assert that by itself. (Or you can chunk up your constraints in say bundles of 50 or so, and assert them separately.) This'd be done using sAnd.

See if that helps. If not, please create a sample program that I can run to recreate the problem to see if I can make some internal change to make it go faster.

arrowd commented 4 years ago

I had

sAnd [(xLoc .== yLoc) .=> (x .== y) | (x, xLoc) <- allVarsWithLocs, (y, yLoc) <- allVarsWithLocs]

and it was consuming a huge amount of memory trying to build up the list. Do you mean I need some smart fold here?

LeventErkok commented 4 years ago

I was thinking a tree like structure, but neither of that is likely going to be very efficient for similar reasons.

Can you post a simple program that I can play around with? I'll see if there's anything we can do without altering the internals too much.

arrowd commented 4 years ago

Ok, will try to come up with minimal repro tomorrow. Thanks.

arrowd commented 4 years ago

Here's the testcase:

test :: IO ()
test = runSMT $ do
  let library = replicate 1000 ()
      n = length library
      numInputs = 0
      m = n + numInputs

  outputLoc <- exists "outputLoc" :: Symbolic SInt32

  componentLocs <- forM library $ \comp -> do
    compInputLocs <- mapM (const exists_) [1..3]
    compOutputLoc <- exists_
    return (compInputLocs, compOutputLoc) :: Symbolic ([SInt32], SInt32)

  outputVar <- exists "output" :: Symbolic SInt32

  componentVars <- forM library $ \comp -> do
      inputVars <- mapM (const exists_) $ [1..3] :: Symbolic [SInt32]
      outputVar <- exists_ :: Symbolic SInt32
      return (inputVars, outputVar)

  constrain $ outputLoc .< (fromIntegral m)
  softConstrain $ outputLoc .>= (literal $ fromIntegral numInputs)

  forM_ componentLocs $ \(compInputLocs, compOutputLoc) -> do
    forM_ compInputLocs $ \inputLoc -> do
      constrain $ 0 .<= inputLoc
      constrain $ inputLoc .<= (fromIntegral $ m-1)

      constrain $ inputLoc .< compOutputLoc

    constrain $ fromIntegral numInputs .<= compOutputLoc
    constrain $ compOutputLoc .<= (fromIntegral $ m-1)

  constrain $ distinct (map snd componentLocs)

  let allVarsWithLocs = (outputVar, outputLoc) :
                      zip (concatMap (uncurry $ flip (:)) componentVars) (concatMap (uncurry $ flip (:)) (componentLocs))

  -- first try
  --let psi = sAnd [(xLoc .== yLoc) .=> (x .== y) | (x, xLoc) <- allVarsWithLocs, (y, yLoc) <- allVarsWithLocs]
  --constrain psi

  -- second try
  forM_ allVarsWithLocs $ \(x, xLoc) ->
      forM_ allVarsWithLocs $ \(y, yLoc) ->
          constrain $ (xLoc .== yLoc) .=> (x .== y)

  query $ checkSat
  return ()

Let me know if you would like me to reduce this further.

LeventErkok commented 4 years ago

I changed the beginning of your program to:

test :: Int -> IO ()
test bound = runSMTWith z3{verbose=True} $ do
  let library = replicate bound ()
   ...

Running

test 10

shows the largest symbolic index at s3477.

Running:

test 1000

shows the largest symbolic index (after a while!) at s322707

So, a 10-fold increase in input creates 100-fold increase in the generated program. With bound set to 1000 like you have, this will never finish.

Furthermore, I very much doubt that the underlying SMT solver can deal with that many variables either; even if you use it directly at its C/C++ API bypassing all of the SMTLib layer in between.

My point is: Even if we speed SBV up so it constructs and sends these constraints faster (granted it isn't exactly clear how to do that), it won't help you. Just sending the program to the SMT solver over the pipe is going to take a very long time.

If this is the level of complexity you have, I'm afraid SMT solvers and SBV isn't really going to help you out. I haven't studied your program at any length, but I'm hoping there's a different way to approach it so it doesn't explode into this many constraints. If you do need all of these constraints, you might be out of luck; both from problem-generation perspective, and also from solving perspective.

One thing to try might be to see if you can go incrementally. Instead of asserting all these constraints in one go, can you add a bunch (say about 100) and check for sat and incrementally continue as needed? In the worst case, this would still blow-up, but I suspect it might do the trick as perhaps you don't need all the constraints together present to solve the underlying problem?

I'm also noticing that you're using softConstraint. Z3's maxsat solver (which is used by softConsraint) is not incemental. (i.e., won't work in the query mode.) So, that's also something to keep in mind. Another idea might be to get a satisfying solution (via hard-constraints), and then see if you can "improve" upon it by keeping a large subset of the variables fixed and trying to optimize (i.e., soft-constrain) a smaller subset.

arrowd commented 4 years ago

One thing to try might be to see if you can go incrementally. Instead of asserting all these constraints in one go, can you add a bunch (say about 100) and check for sat and incrementally continue as needed? In the worst case, this would still blow-up, but I suspect it might do the trick as perhaps you don't need all the constraints together present to solve the underlying problem?

Yep, this is what I was planning in case SBV isn't really a bottleneck. I'll try this way.

I'm also noticing that you're using softConstraint. Z3's maxsat solver (which is used by softConsraint) is not incemental. (i.e., won't work in the query mode.)

Can you a bit elaborate? What do you mean by won't work? Right now I'm getting seemingly correct results with this code (with small library, of course).

LeventErkok commented 4 years ago

If you use SBV in the query mode (here's an example: https://hackage.haskell.org/package/sbv-8.6/docs/src/Documentation.SBV.Examples.Queries.AllSat.html#goodSum) then you cannot use softConstrain.

Query mode allows for incrementally asserting constraints, querying the solver and programming the interaction. In this mode softConstrain calls are not supported. (This is because Z3 doesn't support maxsat in incremental mode.)

In practical terms, once you call query, you can no longer issue softConstrain. If you keep all your softConstrain calls to the segment before query, then they'll work.

arrowd commented 4 years ago

If you keep all your softConstrain calls to the segment before query, then they'll work.

Right, this is exactly my case. Whew.

Ok, thank you for your feedback. I guess, this can be closed.

LeventErkok commented 4 years ago

Actually I misspoke. Soft-constraints are fine even after the query mode is entered, as exemplified below:

{-# LANGUAGE OverloadedStrings #-}
import Data.SBV
import Data.SBV.Control

example = runSMT $ do x <- sString "x"
                      y <- sString "y"

                      constrain $ x .== "x-must-really-be-hello"
                      constrain $ y ./= "y-can-be-anything-but-hello"

                      query $ do io . print =<< checkSat
                                 xv <- getValue x
                                 yv <- getValue y
                                 io . print $ (xv, yv)

                                 softConstrain $ x .== "default-x-value"
                                 softConstrain $ y .== "default-y-value"
                                 io . print =<< checkSat
                                 xv <- getValue x
                                 yv <- getValue y
                                 io . print $ (xv, yv)

This program produces:

Main> example
Sat
("x-must-really-be-hello","\NUL")
Sat
("x-must-really-be-hello","default-y-value")

So, soft-constraints are fine in the query mode just fine. What's not supported is calls to maximize and minimize (i.e., optimization) in the query mode. You'll find that there isn't even any API to express them at that level. Apologies for the confusion.

LeventErkok commented 4 years ago

Closing this ticket, but feel free to ping if you run into any other issues.