Open johnbender opened 5 days ago
Thanks for the report.
Unfortunately I cannot really replicate this to see if we can improve it in any way, because it uses a pretty old version of SBV. In particular, the handling of quantifiers have changed significantly, and your program is no longer valid. (I can't really go back and fix things about older releases.)
If you could send a test case against the latest SBV release, I can take a closer look to see if things can be improved. Do not worry about making it go fast/slow; just create a problem that takes too long to send to z3, and I'll see what the bottleneck might be.
Updated for testing with v10.12. The case with the quantifier takes about half the time of the one with the conjuncts. Our suspicions about the cause remain the same but we are grateful for your time and attention either way!
import Data.SBV
import System.Environment (getArgs)
termCount = 20000
generateTerm quant = do
-- (\(Forall x) -> foldr (\i acc -> acc .&& x .== (i :: SWord16)) sTrue (take termCount [1..]))
(\x -> foldr (\i acc -> acc .&& x .== (i :: SWord16)) sTrue (take termCount [1..]))
config trns =
z3 { transcript = if trns then Just "/tmp/sbv.minimal.out" else Nothing }
main :: IO ()
main = do
args <- getArgs
let quant = elem "quant" args
let trans = elem "transcript" args
proveWith (config trans) (generateTerm quant) >>= print
Is this the correct program?
Note that your generateTerm
does not use the quant
parameter as it stands.
Yep this is correct, just a leftover from modifying the original. The commented term is there for comparison.
It'd be nice if there was a way to compare fast/slow versions; if you can craft something like that. But this is OK too. My understanding is that this benchmark is way too slow for your use case. Correct? Any way you can create a slow/fast variants with the latest SBV?
Sorry, I took "Do not worry about making it go fast/slow" to mean that I shouldn't worry about including both in the program. The following will work with the quant
argument again.
import Data.SBV
import System.Environment (getArgs)
termCount = 20000
config trns =
z3 { transcript = if trns then Just "/tmp/sbv.minimal.out" else Nothing }
main :: IO ()
main = do
args <- getArgs
let quant = elem "quant" args
let trans = elem "transcript" args
(if quant
then proveWith (config trans) (\(Forall x) -> foldr (\i acc -> acc .&& x .== (i :: SWord16)) sTrue (take termCount [1..]))
else proveWith (config trans) (\x -> foldr (\i acc -> acc .&& x .== (i :: SWord16)) sTrue (take termCount [1..])))
>>= print
Ah and to answer your question about our use case, we have many queries that involve a large top level conjunct and the overhead demonstrated here is roughly 3x. The total time in the best case for the queries is already quite long (many minutes) so the increase is expensive in practice.
Cool. Let me take a look and see if I can spot any possible improvements/work-arounds.
What I'm seeing is that the quant
version runs much slower:
$ time ./a quant
Falsifiable
./a quant 16.58s user 1.13s system 97% cpu 18.146 total
$ time ./a
Falsifiable. Counter-example:
s0 = 52768 :: Word16
./a 5.57s user 0.96s system 102% cpu 6.345 total
I thought you said the quantified version ran faster. Am I not running this correctly?
That's the correct way to run it but I'm getting a significantly different timing result. What version of z3 do you have? I'm running 4.12.2.
$ z3 --version
Z3 version 4.13.4 - 64 bit
I actually compile z3 from sources; this one was compiled about a week ago
Tried with 4.13.3 and same result. If I do a profile build would it be helpful to attach that here?
Before we get to profiling..
How did you compile the sources? I did a straight ghc -o a a.hs
. There has been cases where ghc -O2
produced worse performing code in the past. Wonder if it is one of those cases.
I'm compiling within an existing stack project. This suggests a question about what flags we are using and we have the following in our stack.yaml
ghc-options:
- -threaded
- -rtsopts
- -with-rtsopts=-N1 # default to a single thread
When I remove those the -threaded
flag (and also the -with-rtsopts=-N1
since its only valid when -threaded
is present) the times are now the same (that is the quant
variant is no longer faster), though they do not match the times you are seeing.
Interesting. Honestly, I've no idea why those options would matter.
Of course, we can also treat this as simply "this should run faster." In that case, I assume you'd want to focus on the non-quantifier version to run faster?
I suppose but separately I can't understand why the non-quantified version would run so much faster on your machine? I'm working on a 3ghz Xeon with 100gb of ram and the best I can get is ~5s with 20000
for the termCount
. Do your results change when adding the -threaded
?
Correction: I was initially confused by the single line output but it looks comparable. I am interested in how -threaded
affects your timing results. Could it be that the machine over here handles the quantified case fast enough that the solver time is negligible in our the comparison here?
Even with -threaded
, quant
runs a lot slower for me.
I'm on a 5 year old laptop with a mere 16G of memory. Not that it should matter.
Could you try printing the transcript for both examples (potentially with a much smaller term count) and running Z3 on it directly to see how long the solving there takes? Our observation was that with a transcript and handing it directly to Z3 the result was nearly instant for both. I am sort of guessing that your solving for the quantified example takes roughly 9 seconds (the difference between the two) while on our end the solving might be faster and that's why we don't see that quantified example as being much larger.
Note the transcript file name doesn't change with the command line arg so you'll have to move the transcript after one of them if you want to keep it.
z3 seems to be spending a bit more time on the quantified version; but definitely not something that can account for the difference:
$ time z3 quant.smt2 > /dev/null
z3 quant.smt2 > /dev/null 1.50s user 0.12s system 97% cpu 1.659 total
$ time z3 noquant.smt2 > /dev/null
z3 noquant.smt2 > /dev/null 0.84s user 0.17s system 97% cpu 1.030 total
I'm not sure of what to make of the 14 seconds of overhead for the quant example and the 4 seconds of overhead for the conjunct example beyond our original hypothesis that the communication/interaction with the solver seems like a bottleneck?
My experiments attempting to batch the SMTLib strings in to fewer communications didn't make much progress (with the limited time I applied to them) because of expectations in runSolver
about responses.
It's hard to tell where the bottleneck really is. I suppose it's time to profile. Can you do a profile run? Let's see what gets called a lot.
Attached are both strace profiles and basic profiling information for both quant and no quant runs.
A few notes:
strace -c ./a quant
(I believe).example-quant.strace.txt example-quant.prof.txt example-no-quant.strace.txt example-no-quant.prof.txt
Thanks..
I need to look at the profile data in more detail. But at a first glance, looks like uniplate calls are getting really expensive. SBV performs multiple traversals over the final program, and this seems to add up when you have huge programs.
Some of these traversals just ensure that we haven't screwed anything up. (i.e., making sure internal SBV invariants hold over the program.) Unfortunately there's no way to turn these off.
Some of these are used to determine certain data. For instance, to determine which logic to use, we do traversals to figure out what features have been used. Not all of this can be avoided, but perhaps an explicit call to setLogic
can help here: If you call setLogic
yourself then SBV will skip some of these expensive checks and will just take the user given logic.
I don't think this'll save much; perhaps nothing at all.. But can you try by calling setLogic
explicitly yourself to see if it makes any difference?
Again, I need to look at the profile further to see where else we can cut. I imagine there'll be a configuration parameter of the form skipChecks
or some sort, which if the user sets, will cause SBV to skip some of these checks. Exactly where to put these, and whether they'll have an impact is TBD.
Thanks for collecting the data. I'd appreciate if you do the setLogic
experiment to see if it moves the needle at all.
I didn't test all the possible logic values but for many of the quantifier free options using the SetLogic
solver option significantly increases the execution time (~4 seconds to ~24seconds).
I should’ve been clearer. I was asking for just trying one logic setting. QF_BV for unquantified and ALL in the quantifier case.
In any case, that’s unlikely to be the root cause anyhow. I need to look into more of the details of the profile. Alas I’m traveling for the next two weeks, so this’ll take a while.
Safe travels I hope it's an enjoyable trip! I'll keep hacking on my end and post here if I have anything useful to add.
Hello!
We're working with terms that produce a lot of top level SMTLib definitions and we noticed that the interaction overhead with the solver appears to quickly dominate the solver performance.
As an example, the program below generates many terms at the top level but when passed
quant
it ensures that the asserted term is wrapped in a universal quantifier thereby forcing the individual conjuncts into one interaction with the solver. The result is a significant performance improvement.We spent some time looking through the source and the documentation but didn't spot an obvious way to tune the interactions and we expect there are good reasons for this approach. With that in mind we were wondering if an option to disable the call to
lines
inrunSolver
was in the offing (or even made sense) or alternately some other likely much more intelligent approach.Thanks very much!
(SBV 8.15/GHC 8.10.7)
Timing result: