dafny-lang / dafny

Dafny is a verification-aware programming language
https://dafny.org
Other
2.88k stars 256 forks source link

Improve assertion failure wording #3216

Open Timmmm opened 1 year ago

Timmmm commented 1 year ago

Dafny version

3.10.0

Code to produce this issue

assert 1 == 2;

Command to run and resulting output

Open in VSCode.

What happened?

I've done some formal verification of SystemVerilog using SVA and Synopsys's VC_Static tool. With that tool you write properties, click "Prove" and then wait. The possible states of a property are:

A counter-example is only available in the Falsified case.

This all makes intuitive sense to me.

Dafny doesn't seem to do it in the same way. Proven is the same, but all other three states result in the error assertion might not hold which by any reasonable interpretation means it has been falsified.

Even worse, you can press F7 to generate a counter-example even for the inconclusive case where the example actually satisfies the assertion!

Could Dafny not distinguish Falsified and Inconclusive? If not I think the error should at least be reworded to could not prove assertion which I think is what it is actually trying to say. (Similarly for postconditions etc.)

For the counter-examples it should be trivial to determine if the generated counter-example passes or fails the assertion and give a message like this example disproves the assertion or this example passes the assertion but Dafny cannot reason about it (or whatever a passing counter-example is supposed to mean).

Hope this isn't too negative. My overall experience of Dafny so far has been very positive, mainly because of the excellent VSCode extension and amazing documentation.

What type of operating system are you experiencing the problem on?

Windows

MikaelMayer commented 1 year ago

Thanks for using Dafny and for this positive feedback !

Could Dafny not distinguish Falsified and Inconclusive?

The challenge of modeling programs using SMT solvers is that, when using quantifiers, z3 comes up with a counter-example it cannot disprove is not correct. That's where it says "unknown" and returns a counter-example, and Dafny says "assertion might not hold". Everything that is not proven in Dafny is indeed reported indistinguishably as an error. A few remarks.

For the counter-examples it should be trivial to determine if the generated counter-example passes or fails the assertion and give a message like this example disproves the assertion or this example passes the assertion but Dafny cannot reason about it (or whatever a passing counter-example is supposed to mean).

It's not possible in general to "run" counter-examples, because it can involve ghost code that is either not compiled, or cannot be compiled at all (e.g. when using frames to model the heap, we can't take a screenshot of the heap). Also, Dafny models functions with two unrollings maximum when proving them, which means that the third recursive call is uninterpreted, so it's not always helpful to have a counter-example that says that a function's value on the third recursive call is not what would be computed in practice. But otherwise, I like the idea of being able to test if counter-examples make sense, either by running them or verifying them independently if possible.

at least be reworded to could not prove assertion which I think is what it is actually trying to say. (Similarly for postconditions etc.)

I agree that "could not prove assertion" seems better, like postconditions, I had been thinking about it also. Previously, the message stated "assertion violation" which had proven to be awfully confusing for some users.

Timmmm commented 1 year ago

Ok I don't follow fully (total noob here) but if I'm understanding you directly the issue is when you have something like:

lemma Foo(input: int)
   ensures forall i :: 0 <= i < 10000000000000000 ==> some_property(input, i)
{}

Then it can fail to prove it, and give a counter-example of say input = 1000, but it can't in general evaluate the assertions because the forall makes things exponential or possibly even unbounded.

I had a look at the code, and it seems like this bit is where the prover results are translated to Dafny's messages:

      ProverInterface.Outcome.Valid => GutterVerificationStatus.Verified,
      ProverInterface.Outcome.Invalid => GutterVerificationStatus.Error,
      ProverInterface.Outcome.Undetermined => GutterVerificationStatus.Inconclusive,
      ProverInterface.Outcome.TimeOut => GutterVerificationStatus.Error,
      ProverInterface.Outcome.OutOfMemory => GutterVerificationStatus.Error,
      ProverInterface.Outcome.OutOfResource => GutterVerificationStatus.Error,
      ProverInterface.Outcome.Bounded => GutterVerificationStatus.Error,

And then here it converts it to a message:

          case GutterVerificationStatus.Error:
            var failureDescription = description?.FailureDescription ?? "_no message_";
            if (currentlyHoveringPostcondition &&
                  (failureDescription == new PostconditionDescription().FailureDescription ||
                   failureDescription == new EnsuresDescription().FailureDescription)) {
              failureDescription = "This postcondition might not hold on a return path.";
            }
            return $"{obsolescence}[**Error:**](https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-verification-debugging) " +
                   failureDescription;

Where FailureDescription is e.g.:

  public override string FailureDescription =>
    customErrMsg ?? "assertion might not hold";

ProverInterface.Outcome is from Boogie.

Are you saying that none of the ProverInterface.Outcome values actually mean "this assertion is definitely false"? Even ProverInterface.Outcome.Invalid?

In which case what is the practical difference between Invalid and Undetermined?

MikaelMayer commented 1 year ago

Are you saying that none of the ProverInterface.Outcome values actually mean "this assertion is definitely false"? Even ProverInterface.Outcome.Invalid?

When Boogie says invalid, it means that it has a counter-example for the model Dafny sent it. Dafny provides the guarantee that if Boogie says valid, then the Dafny code is valid. I'm not an expert in this domain, @RustanLeino could answer better, But the way Dafny encodes verifications conditions is weaker in Boogie than in Dafny. For example Boogie's counter example includes models for functions that might have a definite implementation in Dafny, it's just that this model does not match the function itself - which was opaque after 2 levels of unrolling.

Once we will get a better debugging experience in VSCode, users would be able to browse the actual counter-example, see where there is a mismatch with Dafny, and potentially give more fuel or write things differently But for now, Invalid and Undetermined are indeed very similar for Dafny, as the counter-examples are not necessarily precise.

Timmmm commented 1 year ago

Ah ok I think I understand. In that case can I suggest:

  1. Remove GutterVerificationStatus.Inconclusive
  2. Reword all the GutterVerificationStatus.Error messages to "could not prove ..."
  3. Change some references to "counter-example" to "possible counter-example", e.g. in the command palette.

Would you accept a PR for that?

MikaelMayer commented 1 year ago
  1. I still think that down the road we could make a better use of GutterVerificationStatus.Inconclusive for rendering. Right now, inclusive is rendered as nothing, so we don't see assertions like when they are conditionally verified in the gutter, but this ought to change to my opinion.
  2. I like "could not prove" better indeed. That's what your issue is about.
  3. Already done recently: We just changed in the next version of VSCode the wording "counter-example" to "counter-example (experimental)" to indicate a hint that it's not fully functional yet.