LeventErkok / sbv

SMT Based Verification in Haskell. Express properties about Haskell programs and automatically prove them using SMT solvers.
https://github.com/LeventErkok/sbv
Other
243 stars 34 forks source link

Phase 1 for BenchSuite complete #528

Closed doyougnu closed 4 years ago

doyougnu commented 4 years ago

Hey Levent,

Here are the rest of the benchmarks for the documentation examples as discuss in https://github.com/LeventErkok/sbv/issues/484. Everything compiles and I made sure that at least the first entry for each documentation "class" (e.g., Strings or Queries) produces a viable benchmark, but I have not run every benchmark.

I think it would be good to create a set of standard benchmarks to use to guide performance efforts. Ideally there would be 2 or so benchmarks per sbv feature or something like that. What are your thoughts?

Also I accidentally committed a cut version of the Uninterpreted.Multiply file, I've fixed this to be correct but if you check over the files in the PR you'll see some white space changes in it.

Jeff

LeventErkok commented 4 years ago

Fantastic!

Regarding how we should do the benchmark tests, I'm going to leave that totally to your judgment. I guess one way to go would be grandfather what you have and see if we can detect major differences. I'm not sure how we'll tell the performance difference caused by SBV and the solver itself; but I trust a good solution will come up as we gain experience.

doyougnu commented 4 years ago

performance difference caused by SBV and the solver itself;

Yes this is why I had defined an overhead benchmark that runs a problem with sbv and generates a z3 program for the same problem then runs the generated program in z3 via an external process. This will be constrained by the program generation though, so if sbv is generating a z3 program that is poorly formed or slow then the benchmark won't be meaningful. But it will allow us to identity sbv induced hot spots.

I think the strategy is as follows:

  1. Identify a set of canonical benchmarks which are representative of the design spaces intended for end users
  2. Run the overhead benchmark on this subset to get two timings for an identical problem: one for sbv, one for z3 natively
  3. Calculate, and track the running difference between the two times produced in 2. Ideally this could be done automatically but this is more of an exploration right now.
LeventErkok commented 4 years ago

Cool. I remember you telling me this strategy before, must've slipped my mind. It sounds like a good way to start to me, and I can't wait to see some output from this. Thanks!