dafny-lang / dafny

Dafny is a verification-aware programming language
https://dafny.org
Other
2.88k stars 256 forks source link

How do I profile my way out of unbounded timeouts? #229

Closed jonhnet closed 4 years ago

jonhnet commented 5 years ago

Say I write some (purely functional) Dafny, and try to verify it. Things are going great. I'm using my superstitions (opaque recursion, exists and comprehensions) to ward off timeouts. Then suddenly I descend into timeout hell: Dafny will spend unbounded time trying to verify a lemma.

No problem, I say! I'll just axiom-profiler that thing, and figure out which sheep entrail I arranged incorrectly! Except that I can't make an axiom profiler work, either. I'm looking for advice, either at the high level ("don't try to diagnose timeouts this way; that's so 2015") or the low level ("use a different profiler" or "ah, you're passing the wrong flag to z3; that's why the profiler doesn't work.")

This is a significant blocking issue for my use of Dafny (and my encouraging others to use it), as I am unable to even offer expert help to unblock a "timeout-infected" Dafny module.

Details:

I have attached an example file that hangs Dafny. Here is how I tried to diagnose it. I'm using a Linux build here, but I have also tried this process on Windows with no success.

  1. Capture a Boogie file with Dafny 2.2.0.10923. dafny /timeLimit:2 /print:test.bpl raftInvariants.dfy
  2. Capture a z3 log with Boogie 2.3.0.61016 and Z3 "version 4.8.5 - 64 bit", git commit 17adecff6802dad7f381dce3a74a1c96acc8dedb of Feb 25. mono ~/dafny-base/boogie/Binaries/Boogie.exe /z3exe:/home/jonh/axiom-profiler/z3/build/z3 /proc:'Impl$$_5_raftInvariants.__default.ElectionSafetyInduction' /timeLimit:8 /z3opt:TRACE=true /z3opt:PROOF=true test.bpl Z3 is built from git clone https://github.com/Nils-Becker/z3.git, as advised by the Axiom Profiler page.
  3. Run the viper axiom profiler hg clone https://bitbucket.org/viperproject/axiom-profiler, hg changeset 357:f0f7be18b655 of Feb 22 2019. mono ~/axiom-profiler/axiom-profiler/bin/Release/AxiomProfiler.exe /l:z3.log

With this particular run, the failure mode is that the axiom profiler draws a Processing... progress bar, closes it, and then spins many minutes; it never opens the profiler page. (With other runs, I have seen a "Stack empty" exception, and then a profiler page. When I have gotten a profiler page, it was with the Stack Empty exception; the result has been unhelpful: the worst offending term had 16 instantiations. I suspect the exception meant the profiler dropped all the important bits?)

raftTimesOutExample.zip

parno commented 5 years ago

Not sure this helps, but with Dafny 2.2.0.10923 on Mac OS X (from the 2.2.0 release on GitHub), I get an assertion violation fairly quickly:

./dafny ~/Downloads/raftTimesOut.dfy /print:raft.dfy /timeLimit:30
Dafny 2.2.0.10923
/Users/parno/Downloads/raftTimesOut.dfy(13,4): Warning: /!\ No terms found to trigger on.
/Users/parno/Downloads/raftTimesOut.dfy(20,4): Warning: /!\ No terms found to trigger on.
/Users/parno/Downloads/raftTimesOut.dfy(34,4): Warning: /!\ No terms found to trigger on.
/Users/parno/Downloads/raftTimesOut.dfy(1122,35): Error: assertion violation
Execution trace:
    (0,0): anon0
    (0,0): anon3_Else

Dafny program verifier finished with 72 verified, 1 error

The warnings about a lack of term to trigger on are at least one possible place where an unlucky choice by Z3 might result in over-triggering.

In general, I haven't had much luck getting the axiom profiler running on other platforms, so I usually resort to setting up a Windows VM.

I've heard of others using qprofdiff (https://github.com/Z3Prover/z3/tree/master/contrib/qprofdiff) with some success. If you have a version of your code with and without the timeout, then qprofdiff might point you at the guilty parties.

jonhnet commented 5 years ago

Bryan, Thanks for your reply. Responses:

  1. Hearing that the same version works well on MacOS makes me sad. :v) What z3 version is your Dafny using? (More broadly, though, I need to be able to diagnose timeouts when they happen, because they well.)

  2. The warnings about a lack of term to trigger on are at least one possible place where an unlucky choice by Z3 might result in over-triggering.

Noted. However, those errors all occur inside opaqued functions. Two were unused library code; the third is in use. I addressed the warning by adding a dummy predicate, and the timeout still occurs. See attached example:

raftTimesOutExample2.zip

  1. "I usually resort to setting up a Windows VM." -- I also have one of those, but haven't had luck on a similar problem. I'll try this raft sample over there, and report back how it fails.

  2. Thanks for the pointer to qprofdiff. I'll try it and report back.

parno commented 5 years ago

I ran Dafny using the Z3 that came with the release, which says that it's 4.5.0. Just for kicks, I tried a different Z3 build that I have checked out (z3-4.5.1.1f29cebd4df6-x64-osx-10.11.6), and it times out.

Using Boogie via:

mono ~/git/boogie/Binaries/Boogie.exe /z3exe:/Users/parno/git/everest/z3-4.5.1.1f29cebd4df6-x64-osx-10.11.6/bin/z3 /proc:'Impl$$_5_raftInvariants.__default.ElectionSafetyInduction' /timeLimit:8 /z3opt:TRACE=true raft__5_raftInvariants.bpl  /timeLimit:30

I get a z3.log file which appears to have useful information, and the Windows version of the profiler produces what looks like useful output (there are a bunch of axioms with cost over 500K). I can try to take a closer look later.

jonhnet commented 5 years ago
  1. [Reporting back on Windows trial] Dafny (ignored /timeLimit and) exited with an assertion violation after a not-completely ridiculous 17s (and no timeout). The profiler also brought up a meaningful display. Unfortunately I was unable to interpret it to get at the root of my problem, but at least now I have a knowledge problem rather than a tool problem. I'll also try to figure out what's different in Windows -- maybe Dafny's z3 (4.5.0) has a different suite of bugs?
jonhnet commented 5 years ago

Oh, my windows run only saw costs of ~40k, which didn't seem glaringly suspicious. (/c:2 gave the Stack Empty exception, and had very low costs.)

jonhnet commented 5 years ago

Update -- Connecting my linux Dafny to a build of z3-4.5.0 cures the timeout problem on this example. It completes in about 16s with a nice, tame assertion violation. Furthermore, focusing the verification on the offending module detects the assertion-violation in a nice speedy 3s.

So at least one conclusion is that there seems to be an instability introduced into z3 in the last few months, at least insofar as it gets driven by Dafny.

I'm still interested in having a plan (axiom profiler?) for figuring out what I did wrong when it was my fault, but it sounds like this particular spin may be rooted in a bug in z3. Thanks Bryan for mentioning that test!

wilcoxjay commented 5 years ago

Here is a non-answer :)

I don't mess with the axiom profiler. I've tried a few times and never got anything useful out of it.

What works for me is to use the verification cache from my editor, to aggressively pass /proc to focus on only one lemma at a time, and to aggressively set the timeout to just a few seconds. Then I start "unrolling" stuff manually: case splitting according to definitions in the post-condition, etc.

If I'm running into a timeout where I just can't seem to do anything useful without it timing out, I try to back up and hide some more things from Dafny until I get to a place where it's not timing out any more. Then I can work out indirectly what's causing the timeout and add manual triggers, or add opacity, etc.

With this workflow, I was able to track down the proximate cause in the file you posted to HandleRequestVoteResponse, which I think should not insert an election record into the history variable, so its last line should instead read v'.global.elections == v.global.elections, since we won't know the election is successful until BecomeLeader fires.

jonhnet commented 5 years ago

James, thanks for the reply! I see that you discovered the bug, but ... how did you discover what the cause of slowness was? That is, a system where everything was verifying quickly suddenly had a function with unbounded slowness (well, until I switched to z3-4.5.0), as though I'd left a non-opaque definition lying around.

I'm quite comfortable using binary search to sort out spec bugs (as you've done here); it's sorting out the cause of slowness that's driving me crazy. (And yes, I saw that you confessed it was a non-answer. :v)

wilcoxjay commented 5 years ago

In this particular case, I don't think my local installation of Dafny ran into any timeouts, so I was cheating a little.

But what I was trying to say (poorly) is that you can use the same binary search techniques to figure out what interaction is causing timeouts. For example, after focusing Dafny on a particular lemma with /proc you could start commenting out irrelevant parts of definitions used in its pre/postcondition.

I realize this still isn't a real answer, but I don't think the user experience of diagnosing timeouts is that different from diagnosing other verification errors.

parno commented 5 years ago

So at least one conclusion is that there seems to be an instability introduced into z3 in the last few months, at least insofar as it gets driven by Dafny.

Unfortunately, I'm not sure you can conclude that from a single data point. It could be that "stability" improved for thousands of examples but just happened to get worse for yours. That's one of the unfortunate sides of relying on heuristics to solve undecidable problems. I've had some students working on quantifying this type of instability so we can be more scientific about measuring it, but they keep getting distracted by non-research activities!

lailabecker commented 5 years ago

Jon,

If you're still interested in using the Axiom Profiler please try the latest version, that I just pushed (https://bitbucket.org/viperproject/axiom-profiler/src/default/). I fixed the bug you described that caused "Stack Empty" exceptions. There also used to be a post-processing step after the progress bar reached 100% that took a very long time to complete if there were a lot of quantifiers present in the log. I've removed that step now so the Axiom Profiler shouldn't get stuck anymore.

When I tried to replicate your results I also passed the /vcsCores:1 option to Boogie to prevent it from starting multiple z3 instances that would all write to the same log file. That option should default to 1 but I usually still pass it explicitly to be safe.

Lastly the changes I made to z3 in my fork have been merged into the main repository so you can use that version as well now. Thanks for noticing that that information was outdated in our readme.

jonhnet commented 5 years ago

Hi Nils!

I am indeed extremely interested in using the Axiom Profiler! (Sorry for the delay, I just got back from travel.)

I just hg clone'd a fresh copy of the axiom profiler, picking up your March 20 changes. The stack empty message is gone (although that behavior was a bit unpredictable). But once the processing progress bar completes and disappears, the profiler process sits for many minutes, consuming 100% of a core, with no UI. This sounds like the "post-processing step" you described eliminating, but ... I don't know. (Are there linux debuggers for mono/C#?)

This is with Z3 4.8.5, and I added /vcsCores:1 to the boogie command line as you advised.

jonhnet commented 5 years ago

So I've been dabbling with trying to get a mono debugger running. If I build the axiom-profiler with /p:Configuration=Debug, the resulting binary comes up almost instantly (no progress bar, no stall), but showing a display where everything has zero of whatever's being counted. I find it odd that the program behaves very differently under the debug build. Not knowing much about mono (yet), I'm not sure what to make of this observation.

jonhnet commented 5 years ago

I didn't get along very well with gdb, so I spent some quality time with the Release build and printf debugging. The slow bit is the AddTopNode loop in loadTree(). It slows as it goes, seeming a bit quadratic.

jonhnet commented 5 years ago

Welp, it turned out it was just slow. After about 13 minutes, the loadTree loop finished, and I got a reasonable looking graph. Now I'm on to trying to interpret it, which I haven't done in years. Wish me luck.

jonhnet commented 5 years ago

So here a few of the highest-instance offenders (they're in the top ~30 entries by cost). They're in the 10k-instances range, which seems high (but not 500k high like Bryan saw). They're all also about generic things like Boxes and Allocs, which makes me sad. My "program" is purely functional state-machine reasoning, with no heap objects. All of my types are ints and algebraic datatypes and type(!new,==)s. Hrmm.

Print name: testbpl.186:15[#99182]
QId: testbpl.186:15
Number of Instantiations: 10695
Cost: 34956
Number of Conflicts: 0

FORALL bx: T@U(
¦ pattern($IsBox(bx, TInt())), 
¦ or(
¦ ¦ not(=(type(...), BoxType())), 
¦ ¦ not($IsBox(bx, TInt())), 
¦ ¦ not(or(not(...), not(...)))
¦ )
)

Print name: funType:$Unbox[#911]
QId: funType:$Unbox
Number of Instantiations: 22467
Cost: 22467
Number of Conflicts: 0

FORALL arg0@@30: T@U, T@@0: T@T(
¦ pattern($Unbox(T@@0, arg0@@30)), 
¦ =(type($Unbox(T@@0, arg0@@30)), T@@0)
)

Print name: testbpl.241:18[#99515]
QId: testbpl.241:18
Number of Instantiations: 11259
Cost: 14860.25
Number of Conflicts: 0

FORALL h: T@U, t@@20: T@U, v@@0: T@U(
¦ pattern($IsAllocBox($Box(v@@0), t@@20, h)), 
¦ or(
¦ ¦ not(=(type(...), TyType())), 
¦ ¦ not(=(type(...), MapType1Type(...))), 
¦ ¦ not(or(not(...), not(...)))
¦ )
)

Print name: funType:MapType0Select[#975]
QId: funType:MapType0Select
Number of Instantiations: 13571
Cost: 13571
Number of Conflicts: 0

FORALL arg1@@6: T@U, arg0@@34: T@U(
¦ pattern(MapType0Select(arg0@@34, arg1@@6)), 
¦ =(type(MapType0Select(arg0@@34, arg1@@6)), MapType0TypeInv1(type(arg0@@34)))
)
wilcoxjay commented 5 years ago

Hi again. Some back-seat driving from me: I have had the experience of only seeing "uninteresting" axioms before, and I fixed it by letting boogie run longer before killing it. How long did you run it for?

parno commented 5 years ago

Indeed, I usually try to do at least 30 seconds to be able to see “abnormal” axioms instantiations stand out.

lailabecker commented 5 years ago

Hi Jon,

There is a known bug having to do with pretty printing information for nested quantifiers. In release builds there are fault tolerance mechanisms that result in just that pretty printing information being ignored but in debug builds an exception is thrown so the issue can be examined using a debugger. If no debugger is attached the thread responsible for loading the log simply crashes and most of the log is never loaded. As a workaround you can either remove the #ifs in l. 968 and l. 1009 of LogProcessor.cs or filter out "attach-var-names" lines from the log file (e.g. using grep -v "^\[attach-var-names\]" z3.log > debug.log).

The issue with it taking a long time for the tool to display anything after it finishes loading the log seems to have to do with populating the middle panel. This takes a couple of seconds for me using Windows so my guess is that the 13 minutes you're experiencing are due to issues with mono. I've pushed a fix that populates that panel less incrementally which seems to improve the performance a little bit on Windows, however, on my Linux VM loading the middle panel is still very slow...

jonhnet commented 5 years ago

Hi Nils,

So I tried a log from a 30s run. The linux version eventually dies with "killed" (which I suspect is the arrival of a signal due to exceeding a memory budget). I carted the file over to my windows vm (5GB of RAM) and it melted the whole VM. If you're interested, here's the logfile:

https://drive.google.com/open?id=1_AUGA4bx4okiwNlZKkZpgtLjPH80eFUq

Are you able to ingest it with your copy of axiom profiler?

lailabecker commented 5 years ago

I've looked into your log a bit. So far what I've noticed is that there seem to be many different versions of the testbpl.324:19 and testbpl.331:19 quantifiers (about 260 each) almost all of them are instantiated 1000+ times. They are part of the axioms defining $Is and $IsAlloc respectively and they basically state that each entry of a map must have the correct type and be allocated after unboxing. These quantifiers are generated for each map (triggered by $Is(v, TMap(t0, t1))) and the inner quantifier is instantiated for each lookup in the map (triggered by Map#Elements(v)[bx]). I tried to see what would happen if I just commented these 2 axioms out: z3 is able to prove unsat for the first 3 checks within a second (previously it got stuck on the first one so I think I might be on the right track) and then gets stuck on the 4th one (possibly because it then needs these axioms). So far I haven't been able to figure out where the lookups come from. There seem to be a number of different quantifiers that get blamed for these instantiations.

lailabecker commented 5 years ago

The problem is the IsQuorum predicate in line 242: we know of a bunch of maps that their domain is ServerIDs(). Once we get Set#Card(ServerIDs()) from the predicate an axiom is triggered that gives us forall m: Map :: Map#Card(m) == Set#Card(Map#Domain(m)). This in turn triggers an axiom that, among other things, gives us exists x :: x in Map#Domain(m). This triggers the axiom I described in my last post (Map#Domain(v)[bx] is also a trigger) and since the domains of a bunch of maps are equal we trigger each axiom multiple times, i.e. if we skolemize x we effectively get forall m1, m2 :: AssertTypesAndAlloc(m1, x(m2)). This yields quadratically many instantiations.

I tried replacing the body of the IsQuorum predicate with true and was then able to obtain the same assertion violation that @parno described.

jonhnet commented 5 years ago

Nils, thanks very much for chasing this down. I'm not sure how to reproduce your magic for the next time Dafny stabs me in the back; I was still unable to learn anything from the profiler.

On the other hand, wilcoxjay@ has taught me some of his bisection-debugging technique for trigger-loop timeouts, and I've experience modest success there. I plan to carry on in that direction for a while. However, I really appreciate your effort trying to help me with the axiom profiler, and your work maintaining it.

Thanks!