Possibility of using `Word` instead of `Integer` for the concrete representation of `SWord`s?

MrChico commented 4 years ago

Performance seems to be a recent theme in these issues, so it looks like I'll stay on track.

I did a quick profiling of my haskell application which now integrates sbv to implement symbolic execution, and found that the function normCV seems to be a significant cost centre when executing on concrete inputs.

Looking at the sbv code a bit, it seems that this might be due to the fact that bitvectors are modeled using Integer, where operations need to be refitted into the proper size, instead of utilizing native Word operations, such as those provided by Data.Word or Data.DoubleWord (https://hackage.haskell.org/package/data-dword). My library deals with quite large words, and it seems performance is suffering a bit due to the constant Integer normalization.

Of course, I can quite easily solve this for my specific use case by introducing a new type CSWord n = Symbolic (SWord n) | Concrete (Word n) and treat the concrete and symbolic case differently, but this is something I kind of expected sbv to already take care of, so perhaps it would be of use to others as well to deal with this here?

I'm wondering if it might be a good idea to add Word (parametric in size?) as another possible value of CVal, to increase performance when dealing with concrete words? I can possibly help with implementation if I can get some pointers on how one would go about this.

LeventErkok commented 4 years ago

You're right that SBV simply uses an Integer to represent both unbounded integers and bit-vectors, regardless of their size or signedness.

The CVal type is where you want to look at. You can add a new constructor there, i.e.:

data CVal = CAlgReal  !AlgReal              -- ^ Algebraic real
          | CInteger  !Integer              -- ^ Bit-vector/unbounded integer
          | CFloat    !Float                -- ^ Float
          | CDouble   !Double               -- ^ Double
          | CChar     !Char                 -- ^ Character
          | CString   !String               -- ^ String
          | CList     ![CVal]               -- ^ List
          | CSet      !(RCSet CVal)         -- ^ Set. Can be regular or complemented.
          | CUserSort !(Maybe Int, String)  -- ^ Value of an uninterpreted/user kind. The Maybe Int shows index position for enumerations
          | CTuple    ![CVal]               -- ^ Tuple
          | CMaybe    !(Maybe CVal)         -- ^ Maybe
          | CEither   !(Either CVal CVal)   -- ^ Disjoint union

would become:

data CVal = CAlgReal  !AlgReal              -- ^ Algebraic real
          | CWord     !Word                 -- ^ Bit-vector
          | CInteger  !Integer              -- ^ Unbounded integer
          | CFloat    !Float                -- ^ Float
          | CDouble   !Double               -- ^ Double
          | CChar     !Char                 -- ^ Character
          | CString   !String               -- ^ String
          | CList     ![CVal]               -- ^ List
          | CSet      !(RCSet CVal)         -- ^ Set. Can be regular or complemented.
          | CUserSort !(Maybe Int, String)  -- ^ Value of an uninterpreted/user kind. The Maybe Int shows index position for enumerations
          | CTuple    ![CVal]               -- ^ Tuple
          | CMaybe    !(Maybe CVal)         -- ^ Maybe
          | CEither   !(Either CVal CVal)   -- ^ Disjoint union

Assuming this Word type you're talking about can handle arbitrary-size words somehow. This wouldn't really be too hard to implement, except you'd need to touch a lot of code.

I'd say give it a shot. My expectation is that Integer is already quite "optimized" and it would be hard to beat its performance. So, profile as you go to see if it'll actually save you anything.

I wonder if adding the following line to normCV would do anything:

normCV :: CV -> CV
normCV c@(CV (KBounded signed sz) (CInteger v)) = c { cvVal = CInteger norm }
 where norm | sz == 0 = 0
            | signed  = let rg = 2 ^ (sz - 1)
                        in case divMod v rg of
                                  (a, b) | even a -> b
                                  (_, b)          -> b - rg
            | v < 2^sz = v                                    -- New line to avoid modulus 
            | True    = v `mod` (2 ^ sz)
normCV c@(CV KBool (CInteger v)) = c { cvVal = CInteger (v .&. 1) }
normCV c                         = c

also would be good to cache 2^sz there; I'd expect that would run quickly on any modern architecture, but you never know.

I'd start with playing with normCV to see if you can make it go faster. (Caching 2^sz and avoiding mod seems to be the obvious things to do.) This might be a cheaper thing to do and help out with performance.

If that doesn't get you anywhere, then look into adding a new CVal constructor. That'd be how I'd approach this.

LeventErkok commented 4 years ago

@MrChico

Here's another thing to try. In general:

x `mod` (2 ^ sz)

is equivalent to:

x .&. (((1 :: Integer) `shiftL` sz) - 1)

at least when x is non-negative. (Negative case might work too, actually, need to do some thinking.)

So, I'm wondering if replacing the modulus in the last case with the above expression might recover some of the performance for you.

Can you give it a try and see what you find out?

@doyougnu Would it be possible for you to benchmark the above change and see if it gains us anything?

LeventErkok commented 4 years ago

Another cheap thing to do: Add an INLINE pragma to normCV. We should do that regardless, I think it'd help in all cases at the expense of minor(?) increase in code size.

MrChico commented 4 years ago

I have experimented with your suggestions @LeventErkok , and here are some things I found:

When adding the case | v < 2^sz = v to normCV, I got a failure: Exception: Numeric.showIntAtBase: applied to negative number -1 when running some tests. This doesn't seem to have been caught by the sbv test suite, but I did not investigate further what goes wrong here.

Unfortunately, I did not see any performance increase on the measurements I'm using when inlining normCV. This was when benchmarking my program which relies on sbv. Which benchmarks can I use to test the performance of sbv directly?

LeventErkok commented 4 years ago

Ah, quite right. That line by itself wouldn't work. The correct one would be:

  | v >= 0 && v < 2^sz = v

But then, I'm not sure if you're really saving anything by that; probably won't impact it.

We should really do the x .&. (((1 :: Integer)shiftLsz) - 1) trick, but I need to check if that works when x is negative. Do you know if it does?

I'm surprised the INLINE directive didn't do anything. Are you compiling with at least -O2? That might be crucial.

For benchmarking, I'll defer to @doyougnu as he's the expert on all matters regarding benchmarking SBV.

LeventErkok commented 4 years ago

@MrChico I just pushed https://github.com/LeventErkok/sbv/commit/dadc7195b34c7e4a65be1c32a05b883857277ced, that I hope should make things a little faster. (At least not slow them down.)

Try compiling with -O2 to make sure the inlining actually does kick in. Do you see any improvements?

MrChico commented 4 years ago

@LeventErkok you are right, it turns out I was not compiling with O2. Doing so does seem to yield a big difference! Here's a summary of six runs of a concrete calculation with a lot of Word256 arithmetic with 3c71a10b:

records:6
sum:36.51
avg:6.085
std:0.0965660395790871
med:6.095max:6.22
min:5.92

and here is another six runs, using dadc7195 instead:

records:6
sum:34.23
avg:5.705
std:0.104363148029616
med:5.74max:5.79
min:5.48

as far as low hanging fruit goes, I'd say that's a win!

MrChico commented 4 years ago

We should really do the x .&. (((1 :: Integer) shiftL sz) - 1) trick, but I need to check if that works when x is negative. Do you know if it does?

I believe that does work. We are dealing with integers represented by two's complement here, right? Then the masking should preserve the lower bits, and the compliment will be translated to the new base

LeventErkok commented 4 years ago

Cool. 5% is nice for a one-line change. But further inspection and benchmarking/profiling can perhaps identify further low-hanging fruit. Please do report what you find out!

doyougnu commented 4 years ago

I have experimented with your suggestions @LeventErkok , and here are some things I found:

When adding the case | v < 2^sz = v to normCV, I got a failure: Exception: Numeric.showIntAtBase: applied to negative number -1 when running some tests. This doesn't seem to have been caught by the sbv test suite, but I did not investigate further what goes wrong here.

Unfortunately, I did not see any performance increase on the measurements I'm using when inlining normCV. This was when benchmarking my program which relies on sbv. Which benchmarks can I use to test the performance of sbv directly?

@MrChico you can use the SBVBenchmark cabal target to benchmark sbv directly. I have some description for the intended work flow in SBVBenchSuite/SBVBenchmark.hs but essentially you would run cabal v2-run SBVBenchmark or cabal bench to generate a set of benchmarks which will be put in SBVBenchSuite/BenchResults. Then you would make your changes and run another benchmark to generate another file. I've been using the bench-show package to compare the benchmarks, but this package expects the benchmarks to be in the same file so you'll have to append the first file with the runs from the second file. Then you can call cabal v2-repl SBVBenchmark and run SBVBenchSuite/Utils/SBVBenchFramework.compareBenchMarksCLI <your-file-here> to compare the benchmarks.

doyougnu commented 4 years ago

FYI I just pushed some changed to the benchmark suite to improve the user interface. You should now be able to do the following:

clone sbv
run cabal v2-run SBVBenchmark, this will run the benchsuite and generate a file in SBVBenchSuite/BenchResults
make your changes
run cabal v2-run SBVBenchmark, now you have two files

generate a report with:

cabal v2-repl SBVBenchmark
> compareBenchmarks old-file new-file

for example:

*Utils.SBVBenchFramework> compareBenchmarks "SBVBenchSuite/BenchResults/2020-07-27-17:04:04.261168259-UTC.csv" "SBVBenchSuite/BenchResults/2020-07-27-17:23:53.048061816-UTC.csv"
(Mean)
Benchmark                                                  default(0)(μs) default(1) - default(0)(%)
---------------------------------------------------------- -------------- --------------------------
Puzzles//NQueens.NQueens 5                                       57403.71                     -59.83
Miscellaneous//SubsetEquality.Transitivity                       32854.40                     -57.71
Puzzles//NQueens.NQueens 1                                       10608.36                     -52.75
Puzzles//NQueens.NQueens 3                                       13544.35                     -44.09
Miscellaneous//SubsetEquality                                    27863.06                     -27.70
Miscellaneous//Tuple                                            176469.91                     -24.81
Puzzles//NQueens.NQueens 2                                        9359.86                     -24.80
Uninterpreted//test                                              28321.36                     -23.95
Puzzles//NQueens.NQueens 4                                       20939.26                     -21.48
Miscellaneous//DeMorgans.Cap                                     25356.26                     -17.93
Uninterpreted//SArray                                             8457.12                     -16.99
Optimizations/VM.allocate                                        22352.01                     -15.79
Puzzles//LadyAndTigers                                           27418.50                     -15.06
Miscellaneous//InclusionIsPO                                     23241.23                     -13.55
Puzzles//Puzzles.SendMoreMoney                                   48744.73                     -12.45
...

gergoerdi commented 4 years ago

I wonder if https://github.com/clash-lang/clash-compiler/pull/1132 can provide some ideas?

LeventErkok commented 4 years ago

@gergoerdi Definitely worth giving a try. I suspect clash doesn't need unbounded integers, so they can get-away with naturals. Unfortunately, SBV does need unbounded signed integers, so we'd have to separate the two cases. This is more work, but might pay of for programs that make extensive use of bit-vector arithmetic. PR's and experiments are most welcome!

LeventErkok commented 4 years ago

I'm closing this issue, as it is really hard to imagine what can be done without an explicit performance test case. Using different concrete types is always a fine idea. But fundamentally SBV is all about symbolic computation, and it has never been intended to run things "quickly" when everything is concrete.

I'm closing this ticket as there's no real actionable item for the time being. Feel free to reopen if there's a convincing use case that can benefit from a rewrite in a significant way.

LeventErkok / sbv

Possibility of using `Word` instead of `Integer` for the concrete representation of `SWord`s? #547