Closed LeventErkok closed 11 months ago
@doyougnu
Wondering if you'd have the time to take a look at this; definitely your expertise area. Even if you don't have the time, if you can at least confirm what I'm seeing, it'd be nice. (Just to make sure my environment isn't screwed up somehow.) Of course, any other insight you have into this would be much appreciated as well.
Hey Levent. You have perfect timing. I'm currently writing case studies for the haskell optimization handbook and the #572 issue was top of my list. This one seems more self contained so I'll look into it either this weekend or on Monday morning during my scheduled handbook writing time.
From the looks of it this might be a major regression in GHC. If so I'll link the GHC ticket.
@doyougnu Thanks! Jeffrey! I'm glad the expert in this area is looking at this bizarre case.
Took a brief look at the optimization handbook, and seems like a wonderful resource. Looking forward to digging into it more in the upcoming days. Thanks..
hmm this also occurs with ghc 8.10.7:
[nix-shell:~/programming/haskell/sbv642]$ ./test642.sh
Tue Feb 28 12:33:52 PM UTC 2023 The Glorious Glasgow Haskell Compilation System, version 8.10.7
Tue Feb 28 12:33:52 PM UTC 2023 Running compiled version
Tue Feb 28 12:33:53 PM UTC 2023 [1 of 1] Compiling Issue642 ( Issue642.hs, Issue642.o )
Linking Issue642 ...
Tue Feb 28 12:33:53 PM UTC 2023 Compiled, running
real 42m32.342s
user 42m29.672s
sys 0m2.410s
Tue Feb 28 01:16:25 PM UTC 2023
Tue Feb 28 01:16:25 PM UTC 2023 Done, Cleanup2
Tue Feb 28 01:16:25 PM UTC 2023
Tue Feb 28 01:16:25 PM UTC 2023 Interpreted, running
real 1m51.887s
user 1m49.175s
sys 0m2.963s
Tue Feb 28 01:18:17 PM UTC 2023
Tue Feb 28 01:18:17 PM UTC 2023 Done
the compiled version runs in 42 minutes while the interpreted is done it 1 min 51 sec.
Wow, that looks even worse than the results I got for GHC 9.4.4.
I'm pretty sure this wasn't an issue about 3 years ago; maybe some variant of GHC7? If you've a simple way to figure out which version was the culprit, that'd be really helpful.
yea that was just a spot check. I'll do some more testing tonight
I couldn't get GHC 7 to compile so I tried 8.8 and got similar results:
Wed Mar 1 01:12:50 PM UTC 2023 The Glorious Glasgow Haskell Compilation System, version 8.8.4
Wed Mar 1 01:12:50 PM UTC 2023 Running compiled version
Wed Mar 1 01:12:51 PM UTC 2023 [1 of 1] Compiling Issue642 ( Issue642.hs, Issue642.o )
Linking Issue642 ...
Wed Mar 1 01:12:51 PM UTC 2023 Compiled, running
real 42m14.519s
user 42m11.150s
sys 0m3.186s
Wed Mar 1 01:55:05 PM UTC 2023
Wed Mar 1 01:55:05 PM UTC 2023 Done, Cleanup2
Wed Mar 1 01:55:05 PM UTC 2023
Wed Mar 1 01:55:05 PM UTC 2023 Interpreted, running
real 1m47.442s
user 1m44.933s
sys 0m2.781s
Wed Mar 1 01:56:53 PM UTC 2023
Wed Mar 1 01:56:53 PM UTC 2023 Done
So this could be a regression in sbv
rather than ghc
. I'll profile tomorrow (I'm already out of time today) to start seeing what is taking time but just anecdotally memory on my system increases quite a bit while this runs and is much higher in the compiled run than the interpreted one.
I suppose it could be an SBV regression. But anytime the interpreter outperforms the compiler (and significantly in this case), there's a bit of blame to be shared by GHC too. Perhaps an SBV regression that triggered a GHC bug. That's possible I guess.
If you want to try with older SBV's, I'd go for something around 2019'ish.
Ran some more experiments, first some observations:
sha256
program does not exhibit this issue. While the compiled version is still slower than the expression evaluation mode
version its not 10-20x or whatever slower.IntMap
StableName
[]
and (,)
So I suspect that there is an optimization happening in compilation mode that is not happening in -e
mode that is nuking sharing in the cache. Notice that this isn't a profile of a memory leak per se because it isn't a triangle shape. If it were a memory leak we would see a peak of memory and then a downslope. But instead what we get is a nice logarithmic curve which is what I would expect to see if the cache was never actually caching and every check on the cache was just adding a new element.Yes, I noticed the same. 224/256 go relatively comparable for compiled and interpreted. 384 and 512 exhibit major difference. I'm really surprised as to why this would be, as the algorithms are quite similar. (Larger bit sizes have more iterations and. I do expect them to be slower, but not exhibit this sort of behavior.)
GHC ticket opened: https://gitlab.haskell.org/ghc/ghc/-/issues/23081
Not sure if it matters; but when generating code, SBV does not call z3 at all. The entire computation happens on the Haskell side. (i.e., you can generate code without even z3 installed on your computer.)
Hey Levent,
Is it possible these two modes are not computing the same thing? As a sanity check I diff'd the output of a compiled and -e
build and got this:
11:03:20 ❯ diff testOutput_delete testOutput_delete_expr_eval/
Only in testOutput_delete_expr_eval/: sha256.c
Only in testOutput_delete_expr_eval/: sha256_driver.c
Only in testOutput_delete_expr_eval/: sha256.h
diff '--color=auto' testOutput_delete/sha512_driver.c testOutput_delete_expr_eval/sha512_driver.c
10,14c10,14
< 118, 10, 114, 108, 147, 134, 118, 200, 11, 65, 65, 148, 211,
< 55, 242, 11, 105, 24, 255, 109, 227, 0, 159, 157, 203, 134,
< 172, 133, 88, 247, 253, 162, 35, 166, 20, 220, 252, 3, 56,
< 235, 47, 238, 62, 172, 136, 188, 2, 104, 101, 116, 241, 107,
< 195, 4, 255, 209, 81, 210, 226, 6, 35, 175, 77, 10
---
> 169, 137, 31, 20, 104, 103, 191, 61, 214, 50, 31, 226, 177,
> 246, 166, 145, 211, 216, 244, 70, 185, 79, 51, 250, 47, 114,
> 81, 104, 98, 147, 253, 31, 144, 178, 83, 141, 110, 84, 238,
> 173, 22, 196, 6, 99, 229, 78, 124, 181, 88, 254, 142, 133,
> 139, 1, 89, 91, 212, 55, 17, 29, 115, 33, 64, 48
23,32c23,32
< 199, 76, 31, 121, 114, 11, 214, 59, 58, 10, 187, 50, 255,
< 96, 212, 133, 154, 26, 87, 20, 243, 133, 210, 103, 187, 192,
< 72, 182, 93, 64, 129, 168, 243, 128, 62, 211, 227, 150, 181,
< 142, 252, 233, 107, 191, 176, 245, 181, 108, 63, 61, 129, 11,
< 251, 5, 35, 38, 202, 38, 217, 117, 39, 64, 159, 90, 144,
< 143, 0, 124, 7, 151, 199, 170, 67, 216, 215, 204, 225, 173,
< 62, 48, 254, 83, 211, 141, 237, 168, 152, 232, 189, 33, 226,
< 116, 24, 217, 26, 35, 40, 155, 35, 102, 217, 135, 230, 170,
< 121, 162, 43, 217, 136, 32, 163, 132, 186, 136, 117, 54, 217,
< 249, 58, 213, 149, 151, 46, 176, 83, 115, 92, 120
---
> 159, 9, 31, 113, 8, 186, 21, 143, 113, 237, 127, 43, 89,
> 202, 54, 127, 98, 163, 78, 131, 17, 217, 4, 213, 202, 198,
> 73, 125, 39, 220, 107, 183, 213, 175, 118, 13, 56, 51, 23,
> 104, 64, 52, 208, 111, 123, 167, 106, 65, 170, 198, 14, 220,
> 210, 76, 220, 213, 227, 239, 233, 156, 39, 169, 62, 246, 230,
> 246, 76, 102, 46, 76, 231, 56, 128, 120, 44, 219, 57, 91,
> 113, 94, 108, 153, 190, 203, 174, 126, 230, 83, 71, 141, 182,
> 238, 45, 69, 10, 144, 177, 237, 121, 241, 181, 164, 55, 96,
> 87, 53, 156, 185, 13, 33, 33, 9, 145, 87, 253, 174, 254,
> 92, 181, 5, 23, 108, 110, 166, 107, 23, 125, 129
and sure enough the difference is in sha512_driver.c
. Here is the compiled version:
// compiled (I added this comment in github)
int main(void)
{
const SWord8 hIn[64] = {
118, 10, 114, 108, 147, 134, 118, 200, 11, 65, 65, 148, 211,
55, 242, 11, 105, 24, 255, 109, 227, 0, 159, 157, 203, 134,
172, 133, 88, 247, 253, 162, 35, 166, 20, 220, 252, 3, 56,
235, 47, 238, 62, 172, 136, 188, 2, 104, 101, 116, 241, 107,
195, 4, 255, 209, 81, 210, 226, 6, 35, 175, 77, 10
};
printf("Contents of input array hIn:\n");
int hIn_ctr;
for(hIn_ctr = 0; hIn_ctr < 64 ; ++hIn_ctr)
printf(" hIn[%2d] = %"PRIu8"\n", hIn_ctr ,hIn[hIn_ctr]);
const SWord8 block[128] = {
199, 76, 31, 121, 114, 11, 214, 59, 58, 10, 187, 50, 255,
96, 212, 133, 154, 26, 87, 20, 243, 133, 210, 103, 187, 192,
72, 182, 93, 64, 129, 168, 243, 128, 62, 211, 227, 150, 181,
142, 252, 233, 107, 191, 176, 245, 181, 108, 63, 61, 129, 11,
251, 5, 35, 38, 202, 38, 217, 117, 39, 64, 159, 90, 144,
143, 0, 124, 7, 151, 199, 170, 67, 216, 215, 204, 225, 173,
62, 48, 254, 83, 211, 141, 237, 168, 152, 232, 189, 33, 226,
116, 24, 217, 26, 35, 40, 155, 35, 102, 217, 135, 230, 170,
121, 162, 43, 217, 136, 32, 163, 132, 186, 136, 117, 54, 217,
249, 58, 213, 149, 151, 46, 176, 83, 115, 92, 120
};
and the interpreted one:
// eval-expr mode (-e), again comment added in github
int main(void)
{
const SWord8 hIn[64] = {
169, 137, 31, 20, 104, 103, 191, 61, 214, 50, 31, 226, 177,
246, 166, 145, 211, 216, 244, 70, 185, 79, 51, 250, 47, 114,
81, 104, 98, 147, 253, 31, 144, 178, 83, 141, 110, 84, 238,
173, 22, 196, 6, 99, 229, 78, 124, 181, 88, 254, 142, 133,
139, 1, 89, 91, 212, 55, 17, 29, 115, 33, 64, 48
};
printf("Contents of input array hIn:\n");
int hIn_ctr;
for(hIn_ctr = 0; hIn_ctr < 64 ; ++hIn_ctr)
printf(" hIn[%2d] = %"PRIu8"\n", hIn_ctr ,hIn[hIn_ctr]);
const SWord8 block[128] = {
159, 9, 31, 113, 8, 186, 21, 143, 113, 237, 127, 43, 89,
202, 54, 127, 98, 163, 78, 131, 17, 217, 4, 213, 202, 198,
73, 125, 39, 220, 107, 183, 213, 175, 118, 13, 56, 51, 23,
104, 64, 52, 208, 111, 123, 167, 106, 65, 170, 198, 14, 220,
210, 76, 220, 213, 227, 239, 233, 156, 39, 169, 62, 246, 230,
246, 76, 102, 46, 76, 231, 56, 128, 120, 44, 219, 57, 91,
113, 94, 108, 153, 190, 203, 174, 126, 230, 83, 71, 141, 182,
238, 45, 69, 10, 144, 177, 237, 121, 241, 181, 164, 55, 96,
87, 53, 156, 185, 13, 33, 33, 9, 145, 87, 253, 174, 254,
92, 181, 5, 23, 108, 110, 166, 107, 23, 125, 129
};
So it seems the SWord8 block
s are different. I'm not familiar with the SHA algorithm. Any idea on what this difference means?
This is expected. When we generate the driver program, SBV picks the argument values randomly. There's a way to fix the numbers, by calling cgSetDriverValues
, but this simple example doesn't use that. You can verify by adding a line of the form:
cgSetDriverValues (repeat 0)
in the code-generation block to make sure they're fed the same things. (Not that I think it'd make any difference.)
When comparing the output, compare everything but the driver file. Everything else should be identical. Is that not the case?
great!
Is that not the case?
Nope that is the case. The hunt continues :)
Hey Levent,
This still occurs (and is even worse!) with sbv-8.4
and ghc-8.4.4
. sbv 8.4
was the first sbv release to have the SHA
documentation example which is why I tested it:
07:45:32 ❯ time cabal test Issue642
Build profile: -w ghc-8.4.4 -O1
In order, the following will be built (use -v for more details):
- sbv-8.4 (test:Issue642) (first run)
Preprocessing test suite 'Issue642' for sbv-8.4..
Building test suite 'Issue642' for sbv-8.4..
Running 1 test suites...
Test suite Issue642: RUNNING...
Test suite Issue642: PASS
Test suite logged to:
/store/Programming/haskell/sbv/dist-newstyle/build/x86_64-linux/ghc-8.4.4/sbv-8.4/t/Issue642/test/sbv-8.4-Issue642.log
1 of 1 test suites (1 of 1 test cases) passed.
________________________________________________________
Executed in 87.94 mins fish external
usr time 87.82 mins 477.00 micros 87.82 mins
sys time 0.11 mins 0.00 micros 0.11 mins
~/programming/haskell/sbv tags/v8.4^0* 1h 27m 56s
So I no longer think this is a regression and just a straightforward performance issue. It could be that the observable sharing the hash cons'ing is based on is too lazy and so we're caching thunks. That at least is my guess for this profile and because we've already had that discussion (although I wasn't familiar with that work at the time I am now!)
Hi Jeff,
I'm pretty sure this wasn't an issue when SBV 8.4 came out, with respect to the version of GHC that I was using at that time. I remember this work was done with an older version of GHC at the time; I want to say something in the GHC 6 series even.
I know this is turning into a wild-goose chase, but is there any way you can try SBV 8.4 with GHC 6? I wouldn't be surprised if you can't even get a machine to run GHC 6 anymore (old libraries probably), but there was a time where this wasn't an issue. Perhaps no longer repeatable.
In any case, there's a bug somewhere; whether it's SBV or GHC, or a combination thereof. At this point this is a curiosity since I have the ghci
workaround, but would be nice to nail it down. Thanks for all the cycles you're spending on this!
I know this is turning into a wild-goose chase, but is there any way you can try SBV 8.4 with GHC 6? I wouldn't be surprised if you can't even get a machine to run GHC 6 anymore (old libraries probably), but there was a time where this wasn't an issue. Perhaps no longer repeatable.
No worries. I can try to build with GHC 6, the earliest I've gotten to work was GHC 7, but I'm using nix to get the right versions of everything so it really isn't that hard, its just a matter of finding the right version of nixpkgs
that builds the GHC I'm looking for.
At this point this is a curiosity since I have the ghci workaround, but would be nice to nail it down.
What is the workaround? I'm pretty sure the -e
mode uses the bytecode interpreter (i.e., ghci
), and one of the problems is that it is hard to debug ghci
internals, whereas debugging the compiler internals is pretty straightforward all things considered.
Thanks for all the cycles you're spending on this!
No problem! Besides I'm using it as a case study. No work shall go unused!
"Workaround" as in I can get the C-code generated for SHA384/512 by just using ghci. Not that there's an actual fix for the problem. Thanks for all the sleuthing!
Hey Levent,
I know this is turning into a wild-goose chase, but is there any way you can try SBV 8.4 with GHC 6? I wouldn't be surprised if you can't even get a machine to run GHC 6 anymore (old libraries probably), but there was a time where this wasn't an issue. Perhaps no longer repeatable.
Indeed, this was much harder than I expected because even cabal versions start to become problematic. For what its worth, in the timeline, GHC 6 is around 2010, while GHC 7.04 (the last version I successfully reproduced with) was release June, 2011. So if we are just interested in reproducing the bug with a pre-2019 GHC then GHC 7 should be sufficient. In any case, I think trying to build with GHC 6 is more trouble than its worth now.
My recollection is that I was behind a corporate firewall with an old computer running CentOS5 (I think?) with old glibc, hence I had to grab an old build that was able to run on that.
But I agree, this is too much work and hardly worth it. Your time is better spent on trying to figure out how come we see this with the latest GHC obviously. (Breaking point would be nice to identify, but things move fast in that land.) Thanks!
My recollection is that I was behind a corporate firewall with an old computer running CentOS5 (I think?) with old glibc, hence I had to grab an old build that was able to run on that.
Ah that explains the several year gap!
Hey Levent,
Just an FYI I've started working on this again. Had to get a paper out of the way first :)
I reported this here: https://gitlab.haskell.org/ghc/ghc/-/issues/24210
Perhaps the GHC gurus will have an insight..
Thanks for the report! My apologies but I just haven't been able to get to it
No worries.. If you can summarize your findings on the GHC ticket, that might help the developers over there. You have the most insight into what's going on here.. If you've the time, of course!
I'm going to close this ticket here; since we reported it to GHC HQ. If they come back with some recommendations on the SBV side, we can reopen it.
Put the following in file
a.hs
:Let's run it after compiling with ghc:
That's kind of slow, takes 15.5 minutes. Let's try "interpreting" instead:
Less than 1.5 minutes; that's 10 times faster!
This is on a Mac with GHC 9.4.4. But I observed the same on Linux with GHC 9.2.2.
Last time I worked on code like this was about 3 years ago, with an older version of GHC; and I don't remember it being this slow when compiled. (Though I can't recall how long it took back then, nor what the actual GHC version was. This was back in 2019 or so.)
This is really curious, and most assuredly a GHC issue. Would be nice to analyze this and reduce it to something we can report to them. (I guess we can report as is as well, if we can't reduce it to something simpler; but would be good to give it a try.)