LeventErkok / sbv

SMT Based Verification in Haskell. Express properties about Haskell programs and automatically prove them using SMT solvers.
https://github.com/LeventErkok/sbv
Other
242 stars 34 forks source link

Bizarre performance issue #642

Closed LeventErkok closed 11 months ago

LeventErkok commented 1 year ago

Put the following in file a.hs:

import Data.SBV
import Data.SBV.Tools.CodeGen
import Documentation.SBV.Examples.Crypto.SHA

main :: IO ()
main = compileToC (Just "testOutput_delete") "sha512" $ do

        cgOverwriteFiles True

        let algorithm = sha512P

        hInBytes   <- cgInputArr  64 "hIn"
        blockBytes <- cgInputArr 128 "block"

        let hIn   = chunkBy 8 fromBytes hInBytes
            block = chunkBy 8 fromBytes blockBytes

            result = hashBlock algorithm hIn (Block block)

        cgOutputArr "hash" $ concatMap toBytes result

Let's run it after compiling with ghc:

$ ghc a.hs
$ time ./a
./a  884.88s user 30.35s system 99% cpu 15:22.54 total

That's kind of slow, takes 15.5 minutes. Let's try "interpreting" instead:

$ /bin/rm -f a a.hi a.o
$ time ghc a.hs -e main
ghc a.hs -e main  67.44s user 11.81s system 95% cpu 1:22.90 total

Less than 1.5 minutes; that's 10 times faster!

This is on a Mac with GHC 9.4.4. But I observed the same on Linux with GHC 9.2.2.

Last time I worked on code like this was about 3 years ago, with an older version of GHC; and I don't remember it being this slow when compiled. (Though I can't recall how long it took back then, nor what the actual GHC version was. This was back in 2019 or so.)

This is really curious, and most assuredly a GHC issue. Would be nice to analyze this and reduce it to something we can report to them. (I guess we can report as is as well, if we can't reduce it to something simpler; but would be good to give it a try.)

LeventErkok commented 1 year ago

@doyougnu

Wondering if you'd have the time to take a look at this; definitely your expertise area. Even if you don't have the time, if you can at least confirm what I'm seeing, it'd be nice. (Just to make sure my environment isn't screwed up somehow.) Of course, any other insight you have into this would be much appreciated as well.

doyougnu commented 1 year ago

Hey Levent. You have perfect timing. I'm currently writing case studies for the haskell optimization handbook and the #572 issue was top of my list. This one seems more self contained so I'll look into it either this weekend or on Monday morning during my scheduled handbook writing time.

From the looks of it this might be a major regression in GHC. If so I'll link the GHC ticket.

LeventErkok commented 1 year ago

@doyougnu Thanks! Jeffrey! I'm glad the expert in this area is looking at this bizarre case.

Took a brief look at the optimization handbook, and seems like a wonderful resource. Looking forward to digging into it more in the upcoming days. Thanks..

doyougnu commented 1 year ago

hmm this also occurs with ghc 8.10.7:

[nix-shell:~/programming/haskell/sbv642]$ ./test642.sh 
Tue Feb 28 12:33:52 PM UTC 2023 The Glorious Glasgow Haskell Compilation System, version 8.10.7
Tue Feb 28 12:33:52 PM UTC 2023 Running compiled version
Tue Feb 28 12:33:53 PM UTC 2023 [1 of 1] Compiling Issue642         ( Issue642.hs, Issue642.o )
Linking Issue642 ...
Tue Feb 28 12:33:53 PM UTC 2023 Compiled, running

real    42m32.342s
user    42m29.672s
sys 0m2.410s
Tue Feb 28 01:16:25 PM UTC 2023 
Tue Feb 28 01:16:25 PM UTC 2023 Done, Cleanup2
Tue Feb 28 01:16:25 PM UTC 2023 
Tue Feb 28 01:16:25 PM UTC 2023 Interpreted, running

real    1m51.887s
user    1m49.175s
sys 0m2.963s
Tue Feb 28 01:18:17 PM UTC 2023 
Tue Feb 28 01:18:17 PM UTC 2023 Done

the compiled version runs in 42 minutes while the interpreted is done it 1 min 51 sec.

LeventErkok commented 1 year ago

Wow, that looks even worse than the results I got for GHC 9.4.4.

I'm pretty sure this wasn't an issue about 3 years ago; maybe some variant of GHC7? If you've a simple way to figure out which version was the culprit, that'd be really helpful.

doyougnu commented 1 year ago

yea that was just a spot check. I'll do some more testing tonight

doyougnu commented 1 year ago

I couldn't get GHC 7 to compile so I tried 8.8 and got similar results:

Wed Mar  1 01:12:50 PM UTC 2023 The Glorious Glasgow Haskell Compilation System, version 8.8.4
Wed Mar  1 01:12:50 PM UTC 2023 Running compiled version
Wed Mar  1 01:12:51 PM UTC 2023 [1 of 1] Compiling Issue642         ( Issue642.hs, Issue642.o )
Linking Issue642 ...
Wed Mar  1 01:12:51 PM UTC 2023 Compiled, running

real    42m14.519s
user    42m11.150s
sys 0m3.186s
Wed Mar  1 01:55:05 PM UTC 2023 
Wed Mar  1 01:55:05 PM UTC 2023 Done, Cleanup2
Wed Mar  1 01:55:05 PM UTC 2023 
Wed Mar  1 01:55:05 PM UTC 2023 Interpreted, running

real    1m47.442s
user    1m44.933s
sys 0m2.781s
Wed Mar  1 01:56:53 PM UTC 2023 
Wed Mar  1 01:56:53 PM UTC 2023 Done

So this could be a regression in sbv rather than ghc. I'll profile tomorrow (I'm already out of time today) to start seeing what is taking time but just anecdotally memory on my system increases quite a bit while this runs and is much higher in the compiled run than the interpreted one.

LeventErkok commented 1 year ago

I suppose it could be an SBV regression. But anytime the interpreter outperforms the compiler (and significantly in this case), there's a bit of blame to be shared by GHC too. Perhaps an SBV regression that triggered a GHC bug. That's possible I guess.

If you want to try with older SBV's, I'd go for something around 2019'ish.

doyougnu commented 1 year ago

Ran some more experiments, first some observations:

  1. the sha256 program does not exhibit this issue. While the compiled version is still slower than the expression evaluation mode version its not 10-20x or whatever slower.
  2. I ran an eventlog over night to see what popped up on a heap profile: image and all of the types that are growing are related to the hash-cons'ing cache: IntMap StableName [] and (,) So I suspect that there is an optimization happening in compilation mode that is not happening in -e mode that is nuking sharing in the cache. Notice that this isn't a profile of a memory leak per se because it isn't a triangle shape. If it were a memory leak we would see a peak of memory and then a downslope. But instead what we get is a nice logarithmic curve which is what I would expect to see if the cache was never actually caching and every check on the cache was just adding a new element.
LeventErkok commented 1 year ago

Yes, I noticed the same. 224/256 go relatively comparable for compiled and interpreted. 384 and 512 exhibit major difference. I'm really surprised as to why this would be, as the algorithms are quite similar. (Larger bit sizes have more iterations and. I do expect them to be slower, but not exhibit this sort of behavior.)

doyougnu commented 1 year ago

GHC ticket opened: https://gitlab.haskell.org/ghc/ghc/-/issues/23081

LeventErkok commented 1 year ago

Not sure if it matters; but when generating code, SBV does not call z3 at all. The entire computation happens on the Haskell side. (i.e., you can generate code without even z3 installed on your computer.)

doyougnu commented 1 year ago

Hey Levent,

Is it possible these two modes are not computing the same thing? As a sanity check I diff'd the output of a compiled and -e build and got this:

11:03:20 ❯ diff testOutput_delete testOutput_delete_expr_eval/
Only in testOutput_delete_expr_eval/: sha256.c
Only in testOutput_delete_expr_eval/: sha256_driver.c
Only in testOutput_delete_expr_eval/: sha256.h
diff '--color=auto' testOutput_delete/sha512_driver.c testOutput_delete_expr_eval/sha512_driver.c
10,14c10,14
<       118,  10, 114, 108, 147, 134, 118, 200,  11,  65,  65, 148, 211,
<        55, 242,  11, 105,  24, 255, 109, 227,   0, 159, 157, 203, 134,
<       172, 133,  88, 247, 253, 162,  35, 166,  20, 220, 252,   3,  56,
<       235,  47, 238,  62, 172, 136, 188,   2, 104, 101, 116, 241, 107,
<       195,   4, 255, 209,  81, 210, 226,   6,  35, 175,  77,  10
---
>       169, 137,  31,  20, 104, 103, 191,  61, 214,  50,  31, 226, 177,
>       246, 166, 145, 211, 216, 244,  70, 185,  79,  51, 250,  47, 114,
>        81, 104,  98, 147, 253,  31, 144, 178,  83, 141, 110,  84, 238,
>       173,  22, 196,   6,  99, 229,  78, 124, 181,  88, 254, 142, 133,
>       139,   1,  89,  91, 212,  55,  17,  29, 115,  33,  64,  48
23,32c23,32
<       199,  76,  31, 121, 114,  11, 214,  59,  58,  10, 187,  50, 255,
<        96, 212, 133, 154,  26,  87,  20, 243, 133, 210, 103, 187, 192,
<        72, 182,  93,  64, 129, 168, 243, 128,  62, 211, 227, 150, 181,
<       142, 252, 233, 107, 191, 176, 245, 181, 108,  63,  61, 129,  11,
<       251,   5,  35,  38, 202,  38, 217, 117,  39,  64, 159,  90, 144,
<       143,   0, 124,   7, 151, 199, 170,  67, 216, 215, 204, 225, 173,
<        62,  48, 254,  83, 211, 141, 237, 168, 152, 232, 189,  33, 226,
<       116,  24, 217,  26,  35,  40, 155,  35, 102, 217, 135, 230, 170,
<       121, 162,  43, 217, 136,  32, 163, 132, 186, 136, 117,  54, 217,
<       249,  58, 213, 149, 151,  46, 176,  83, 115,  92, 120
---
>       159,   9,  31, 113,   8, 186,  21, 143, 113, 237, 127,  43,  89,
>       202,  54, 127,  98, 163,  78, 131,  17, 217,   4, 213, 202, 198,
>        73, 125,  39, 220, 107, 183, 213, 175, 118,  13,  56,  51,  23,
>       104,  64,  52, 208, 111, 123, 167, 106,  65, 170, 198,  14, 220,
>       210,  76, 220, 213, 227, 239, 233, 156,  39, 169,  62, 246, 230,
>       246,  76, 102,  46,  76, 231,  56, 128, 120,  44, 219,  57,  91,
>       113,  94, 108, 153, 190, 203, 174, 126, 230,  83,  71, 141, 182,
>       238,  45,  69,  10, 144, 177, 237, 121, 241, 181, 164,  55,  96,
>        87,  53, 156, 185,  13,  33,  33,   9, 145,  87, 253, 174, 254,
>        92, 181,   5,  23, 108, 110, 166, 107,  23, 125, 129

and sure enough the difference is in sha512_driver.c. Here is the compiled version:

// compiled (I added this comment in github)
int main(void)
{
  const SWord8 hIn[64] = {
      118,  10, 114, 108, 147, 134, 118, 200,  11,  65,  65, 148, 211,
       55, 242,  11, 105,  24, 255, 109, 227,   0, 159, 157, 203, 134,
      172, 133,  88, 247, 253, 162,  35, 166,  20, 220, 252,   3,  56,
      235,  47, 238,  62, 172, 136, 188,   2, 104, 101, 116, 241, 107,
      195,   4, 255, 209,  81, 210, 226,   6,  35, 175,  77,  10
  };

  printf("Contents of input array hIn:\n");
  int hIn_ctr;
  for(hIn_ctr = 0; hIn_ctr < 64 ; ++hIn_ctr)
    printf("  hIn[%2d] = %"PRIu8"\n", hIn_ctr ,hIn[hIn_ctr]);

  const SWord8 block[128] = {
      199,  76,  31, 121, 114,  11, 214,  59,  58,  10, 187,  50, 255,
       96, 212, 133, 154,  26,  87,  20, 243, 133, 210, 103, 187, 192,
       72, 182,  93,  64, 129, 168, 243, 128,  62, 211, 227, 150, 181,
      142, 252, 233, 107, 191, 176, 245, 181, 108,  63,  61, 129,  11,
      251,   5,  35,  38, 202,  38, 217, 117,  39,  64, 159,  90, 144,
      143,   0, 124,   7, 151, 199, 170,  67, 216, 215, 204, 225, 173,
       62,  48, 254,  83, 211, 141, 237, 168, 152, 232, 189,  33, 226,
      116,  24, 217,  26,  35,  40, 155,  35, 102, 217, 135, 230, 170,
      121, 162,  43, 217, 136,  32, 163, 132, 186, 136, 117,  54, 217,
      249,  58, 213, 149, 151,  46, 176,  83, 115,  92, 120
  };

and the interpreted one:

// eval-expr mode (-e), again comment added in github
int main(void)
{
  const SWord8 hIn[64] = {
      169, 137,  31,  20, 104, 103, 191,  61, 214,  50,  31, 226, 177,
      246, 166, 145, 211, 216, 244,  70, 185,  79,  51, 250,  47, 114,
       81, 104,  98, 147, 253,  31, 144, 178,  83, 141, 110,  84, 238,
      173,  22, 196,   6,  99, 229,  78, 124, 181,  88, 254, 142, 133,
      139,   1,  89,  91, 212,  55,  17,  29, 115,  33,  64,  48
  };

  printf("Contents of input array hIn:\n");
  int hIn_ctr;
  for(hIn_ctr = 0; hIn_ctr < 64 ; ++hIn_ctr)
    printf("  hIn[%2d] = %"PRIu8"\n", hIn_ctr ,hIn[hIn_ctr]);

  const SWord8 block[128] = {
      159,   9,  31, 113,   8, 186,  21, 143, 113, 237, 127,  43,  89,
      202,  54, 127,  98, 163,  78, 131,  17, 217,   4, 213, 202, 198,
       73, 125,  39, 220, 107, 183, 213, 175, 118,  13,  56,  51,  23,
      104,  64,  52, 208, 111, 123, 167, 106,  65, 170, 198,  14, 220,
      210,  76, 220, 213, 227, 239, 233, 156,  39, 169,  62, 246, 230,
      246,  76, 102,  46,  76, 231,  56, 128, 120,  44, 219,  57,  91,
      113,  94, 108, 153, 190, 203, 174, 126, 230,  83,  71, 141, 182,
      238,  45,  69,  10, 144, 177, 237, 121, 241, 181, 164,  55,  96,
       87,  53, 156, 185,  13,  33,  33,   9, 145,  87, 253, 174, 254,
       92, 181,   5,  23, 108, 110, 166, 107,  23, 125, 129
  };

So it seems the SWord8 blocks are different. I'm not familiar with the SHA algorithm. Any idea on what this difference means?

LeventErkok commented 1 year ago

This is expected. When we generate the driver program, SBV picks the argument values randomly. There's a way to fix the numbers, by calling cgSetDriverValues, but this simple example doesn't use that. You can verify by adding a line of the form:

cgSetDriverValues (repeat 0)

in the code-generation block to make sure they're fed the same things. (Not that I think it'd make any difference.)

When comparing the output, compare everything but the driver file. Everything else should be identical. Is that not the case?

doyougnu commented 1 year ago

great!

Is that not the case?

Nope that is the case. The hunt continues :)

doyougnu commented 1 year ago

Hey Levent,

This still occurs (and is even worse!) with sbv-8.4 and ghc-8.4.4. sbv 8.4 was the first sbv release to have the SHA documentation example which is why I tested it:

07:45:32 ❯ time cabal test Issue642
Build profile: -w ghc-8.4.4 -O1
In order, the following will be built (use -v for more details):
 - sbv-8.4 (test:Issue642) (first run)
Preprocessing test suite 'Issue642' for sbv-8.4..
Building test suite 'Issue642' for sbv-8.4..
Running 1 test suites...
Test suite Issue642: RUNNING...
Test suite Issue642: PASS
Test suite logged to:
/store/Programming/haskell/sbv/dist-newstyle/build/x86_64-linux/ghc-8.4.4/sbv-8.4/t/Issue642/test/sbv-8.4-Issue642.log
1 of 1 test suites (1 of 1 test cases) passed.

________________________________________________________
Executed in   87.94 mins    fish           external
   usr time   87.82 mins  477.00 micros   87.82 mins
   sys time    0.11 mins    0.00 micros    0.11 mins

~/programming/haskell/sbv tags/v8.4^0* 1h 27m 56s

So I no longer think this is a regression and just a straightforward performance issue. It could be that the observable sharing the hash cons'ing is based on is too lazy and so we're caching thunks. That at least is my guess for this profile and because we've already had that discussion (although I wasn't familiar with that work at the time I am now!)

LeventErkok commented 1 year ago

Hi Jeff,

I'm pretty sure this wasn't an issue when SBV 8.4 came out, with respect to the version of GHC that I was using at that time. I remember this work was done with an older version of GHC at the time; I want to say something in the GHC 6 series even.

I know this is turning into a wild-goose chase, but is there any way you can try SBV 8.4 with GHC 6? I wouldn't be surprised if you can't even get a machine to run GHC 6 anymore (old libraries probably), but there was a time where this wasn't an issue. Perhaps no longer repeatable.

In any case, there's a bug somewhere; whether it's SBV or GHC, or a combination thereof. At this point this is a curiosity since I have the ghci workaround, but would be nice to nail it down. Thanks for all the cycles you're spending on this!

doyougnu commented 1 year ago

I know this is turning into a wild-goose chase, but is there any way you can try SBV 8.4 with GHC 6? I wouldn't be surprised if you can't even get a machine to run GHC 6 anymore (old libraries probably), but there was a time where this wasn't an issue. Perhaps no longer repeatable.

No worries. I can try to build with GHC 6, the earliest I've gotten to work was GHC 7, but I'm using nix to get the right versions of everything so it really isn't that hard, its just a matter of finding the right version of nixpkgs that builds the GHC I'm looking for.

At this point this is a curiosity since I have the ghci workaround, but would be nice to nail it down.

What is the workaround? I'm pretty sure the -e mode uses the bytecode interpreter (i.e., ghci), and one of the problems is that it is hard to debug ghci internals, whereas debugging the compiler internals is pretty straightforward all things considered.

Thanks for all the cycles you're spending on this!

No problem! Besides I'm using it as a case study. No work shall go unused!

LeventErkok commented 1 year ago

"Workaround" as in I can get the C-code generated for SHA384/512 by just using ghci. Not that there's an actual fix for the problem. Thanks for all the sleuthing!

doyougnu commented 1 year ago

Hey Levent,

I know this is turning into a wild-goose chase, but is there any way you can try SBV 8.4 with GHC 6? I wouldn't be surprised if you can't even get a machine to run GHC 6 anymore (old libraries probably), but there was a time where this wasn't an issue. Perhaps no longer repeatable.

Indeed, this was much harder than I expected because even cabal versions start to become problematic. For what its worth, in the timeline, GHC 6 is around 2010, while GHC 7.04 (the last version I successfully reproduced with) was release June, 2011. So if we are just interested in reproducing the bug with a pre-2019 GHC then GHC 7 should be sufficient. In any case, I think trying to build with GHC 6 is more trouble than its worth now.

LeventErkok commented 1 year ago

My recollection is that I was behind a corporate firewall with an old computer running CentOS5 (I think?) with old glibc, hence I had to grab an old build that was able to run on that.

But I agree, this is too much work and hardly worth it. Your time is better spent on trying to figure out how come we see this with the latest GHC obviously. (Breaking point would be nice to identify, but things move fast in that land.) Thanks!

doyougnu commented 1 year ago

My recollection is that I was behind a corporate firewall with an old computer running CentOS5 (I think?) with old glibc, hence I had to grab an old build that was able to run on that.

Ah that explains the several year gap!

doyougnu commented 1 year ago

Hey Levent,

Just an FYI I've started working on this again. Had to get a paper out of the way first :)

LeventErkok commented 11 months ago

I reported this here: https://gitlab.haskell.org/ghc/ghc/-/issues/24210

Perhaps the GHC gurus will have an insight..

doyougnu commented 11 months ago

Thanks for the report! My apologies but I just haven't been able to get to it

LeventErkok commented 11 months ago

No worries.. If you can summarize your findings on the GHC ticket, that might help the developers over there. You have the most insight into what's going on here.. If you've the time, of course!

LeventErkok commented 11 months ago

I'm going to close this ticket here; since we reported it to GHC HQ. If they come back with some recommendations on the SBV side, we can reopen it.