dafny-lang / dafny

Dafny is a verification-aware programming language
https://dafny.org
Other
2.91k stars 261 forks source link

If Z3 can't be run, Dafny sometimes hangs instead of reporting an error #2370

Closed mattmccutchen-amazon closed 9 months ago

mattmccutchen-amazon commented 2 years ago

In some cases in which Dafny can't find Z3, Dafny hangs instead of reporting the expected error "Cannot find any prover executable". This can make it confusing for the user to troubleshoot the problem. Failure to find Z3 isn't likely to happen when using a binary release of Dafny that has Z3 bundled at the right path, but it can happen when building Dafny from source or using a custom build system.

Steps to reproduce (tested on my Linux machine):

  1. Download and extract https://github.com/dafny-lang/dafny/releases/download/v3.7.1/dafny-3.7.1-x64-ubuntu-16.04.zip.
  2. Delete the dafny/z3 directory.
  3. Save the example code below to hang-test.dfy.
  4. Run: ./dafny/dafny /compile:0 hang-test.dfy

Example code:

lemma Dummy1(x: int)
  requires x == 1
  ensures x + 1 == 2 {}

lemma Dummy2(x: int)
  requires x == 1
  ensures x + 1 == 2 {}

lemma Dummy3(x: int)
  requires x == 1
  ensures x + 1 == 2 {}

Expected behavior: Dafny exits with an error message, "Cannot find any prover executable".

Actual behavior: Dafny hangs.

Some further testing I did suggests that the hang happens when the number of methods to be verified is greater than the value of the /vcsCores option, which defaults to 1. With the example code above (which has 3 methods), I observe:

So this may be some kind of deadlock in a thread pool.

mattmccutchen-amazon commented 2 years ago

I just noticed #1914. The two issues may have the same root cause, in which case it probably makes sense to broaden one of them and mark the other as a duplicate.

robin-aws commented 2 years ago

I just noticed #1914. The two issues may have the same root cause, in which case it probably makes sense to broaden one of them and mark the other as a duplicate.

I suspect they will need separate fixes, since the CLI should print out an error but the language server will need to provide a different notification. Perfectly fine for one PR to fix both though. :)

keyboardDrummer commented 1 year ago

I've had Dafny verification hang when it passed incorrect parameters to Z3, which happened because it passed parameters for Z3 4.8.5, while it was using Z3 4.8.8

MikaelMayer commented 1 year ago

How long ago did you have that issue @keyboardDrummer ?

keyboardDrummer commented 1 year ago

Today

MikaelMayer commented 1 year ago

CLI or language server? If CLI, what was the command line you ran?

keyboardDrummer commented 1 year ago

The language server, it passed O:model_compress=false to Z3 4.8.8 which then reports a solver error because it expects O:model.compact=false, but Dafny keeps waiting on a Boogie Task that never completes.

keyboardDrummer commented 1 year ago

The language server is supposed to have code to send the right options to Z3, the method HandleZ3Version, but apparently that wasn't working. I had trouble attaching a debugger in time because that code runs on startup.

MikaelMayer commented 1 year ago

Oh that's concerning. To attach a debugger, I found that you can add Thread.sleep(15000) at the beginning of Program, which gives you 15 seconds to attach the debugger.

atomb commented 1 year ago

This still seems to occur sometimes. We should more thoroughly check the case where a z3 binary exists but can't be executed for some reason (being an incompatible version, having missing shared library dependencies, etc.).

alex-chew commented 1 year ago

Do we currently run any kind of "smoke test" against the specified solver binary, to check that the binary is available and provides a known result, before using it? I see that we run z3 -version but fail silently if we don't get anything back ("might be another solver").

atomb commented 1 year ago

Do we currently run any kind of "smoke test" against the specified solver binary, to check that the binary is available and provides a known result, before using it? I see that we run z3 -version but fail silently if we don't get anything back ("might be another solver").

Not at the moment. I think it might be good to do that. And I think perhaps we should fail if we don't get anything back from z3 -version, since that should always produce something. I think perhaps we should only support other solvers (e.g., CVC5) if they're explicitly specified with a path to the executable, since using them is very experimental.

mattmccutchen-amazon commented 1 year ago

Stepping back: How did a failure to run the solver cause Dafny to hang in the first place? Can we change that hang to an error rather than relying solely on up-front smoke tests? No matter how many smoke tests we add, we may overlook some unusual scenario that may affect some user.

atomb commented 1 year ago

I've looked into this a little, and it's unfortunately a bit tricky to do. From what I can tell, the .NET Process class that we use to invoke the solver hangs if you invoke it with an executable that doesn't exist or can't be executed for some other reason. I would have expected it to throw an exception, but it doesn't seem to. I suspect there's a way to make it work, but it's not as straightforward as it should be.

mattmccutchen-amazon commented 1 year ago

We should more thoroughly check the case where a z3 binary exists but can't be executed for some reason

I updated the title accordingly. The case I'm interested in right now is having broken shared library dependencies since I've run into it in my work at Amazon.

From what I can tell, the .NET Process class that we use to invoke the solver hangs if you invoke it with an executable that doesn't exist or can't be executed for some other reason.

I find it hard to believe that such an important, widely used class would have that kind of fundamental bug; it seems much more likely to me that the problem is elsewhere. So out of curiosity, I debugged Dafny in my current case where Z3 has broken shared library dependencies, and I think I found the root cause of the hang in that case. This line propagates an IOException without releasing checkersSemaphore. Here's the stack trace in case it's of interest:

Broken pipe
   at System.IO.Pipes.PipeStream.WriteCore(ReadOnlySpan`1 buffer)
   at System.IO.Pipes.PipeStream.Write(ReadOnlySpan`1 buffer)
   at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
   at System.IO.StreamWriter.WriteLine(String value)
   at Microsoft.Boogie.SMTLib.SMTLibProcess.Send(String cmd) in REDACTED/boogie/Source/Provers/SMTLib/SMTLibProcess.cs:line 131
   at Microsoft.Boogie.SMTLib.SMTLibInteractiveTheoremProver.Send(String s, Boolean isCommon) in REDACTED/boogie/Source/Provers/SMTLib/SMTLibInteractiveTheoremProver.cs:line 664
   at Microsoft.Boogie.SMTLib.SMTLibProcessTheoremProver.SendThisVC(String s) in REDACTED/boogie/Source/Provers/SMTLib/SMTLibProcessTheoremProver.cs:line 165
   at Microsoft.Boogie.SMTLib.SMTLibInteractiveTheoremProver.FullReset(VCExpressionGenerator generator) in REDACTED/boogie/Source/Provers/SMTLib/SMTLibInteractiveTheoremProver.cs:line 167
   at Microsoft.Boogie.Checker.Target(Program prog, ProverContext ctx, Split split) in REDACTED/boogie/Source/VCGeneration/Checker.cs:line 140
   at VC.CheckerPool.PrepareChecker(Program program, Split split, Checker checker) in REDACTED/boogie/Source/VCGeneration/CheckerPool.cs:line 75
   at VC.CheckerPool.FindCheckerFor(ConditionGeneration vcgen, Split split, CancellationToken cancellationToken) in REDACTED/boogie/Source/VCGeneration/CheckerPool.cs:line 36

The IOException is correctly handled here, but Dafny goes on to verify the next method without checkersSemaphore ever being released. checkersSemaphore is initialized to vcsCores, so if there are more than vcsCores methods, verification of the (vcsCores + 1)-th method will hang waiting for the semaphore, explaining the behavior I saw.

Note: When I tried the new command line format (dafny verify hang-test.dfy), at first it appeared that the problem did not occur, but I was able to trigger it by increasing the number of methods in the input. It seems that dafny verify has a higher default value of vcsCores than the legacy dafny /compile:0.

If I make the following change to the Boogie code, then Dafny no longer hangs:

diff --git a/Source/VCGeneration/CheckerPool.cs b/Source/VCGeneration/CheckerPool.cs
index e4c45182..d51fef17 100644
--- a/Source/VCGeneration/CheckerPool.cs
+++ b/Source/VCGeneration/CheckerPool.cs
@@ -32,8 +32,13 @@ public async Task<Checker> FindCheckerFor(ConditionGeneration vcgen, Split? spli
         checker ??= CreateNewChecker();
       }

-      PrepareChecker(vcgen.program, split, checker);
-      return checker;
+      try {
+        PrepareChecker(vcgen.program, split, checker);
+        return checker;
+      } catch (System.IO.IOException) {
+        checkersSemaphore.Release();
+        throw;
+      }
     }

     private int createdCheckers;

I'm sharing this as a proof of concept. There may be any number of reasons it may not be the best fix. For example, maybe we should call checker.Close() in the catch block. And it would probably need tests, etc. My testing was based on Boogie v2.16.4 because that's the version that Dafny uses and when I tried to build Dafny against the latest Boogie, there were compile errors. But it looks like the relevant code hasn't changed in the latest Boogie.

Again, this is for the case where Z3 has broken shared library dependencies. I don't know whether there are other ways in which Z3 could fail that would cause the Dafny/Boogie code to fail in a different place.

atomb commented 1 year ago

Thanks for digging so deeply into debugging the issue! That should be a straightforward fix to apply.

atomb commented 1 year ago

And @keyboardDrummer ended up fixing it: https://github.com/boogie-org/boogie/pull/719

I'll leave this open until we've updated Dafny to depend on that Boogie version and tested it from the Dafny side, though.

atomb commented 9 months ago

The Boogie fix was incorporated in 6675361de97be53863b41fd6423dc9d5b5bf001e. The problem doesn't appear to exist anymore, in the tests I've tried.