Open Timmmm opened 1 year ago
Thanks for using Dafny and for this positive feedback !
Could Dafny not distinguish Falsified and Inconclusive?
The challenge of modeling programs using SMT solvers is that, when using quantifiers, z3 comes up with a counter-example it cannot disprove is not correct. That's where it says "unknown" and returns a counter-example, and Dafny says "assertion might not hold". Everything that is not proven in Dafny is indeed reported indistinguishably as an error. A few remarks.
For the counter-examples it should be trivial to determine if the generated counter-example passes or fails the assertion and give a message like this example disproves the assertion or this example passes the assertion but Dafny cannot reason about it (or whatever a passing counter-example is supposed to mean).
It's not possible in general to "run" counter-examples, because it can involve ghost code that is either not compiled, or cannot be compiled at all (e.g. when using frames to model the heap, we can't take a screenshot of the heap). Also, Dafny models functions with two unrollings maximum when proving them, which means that the third recursive call is uninterpreted, so it's not always helpful to have a counter-example that says that a function's value on the third recursive call is not what would be computed in practice. But otherwise, I like the idea of being able to test if counter-examples make sense, either by running them or verifying them independently if possible.
at least be reworded to could not prove assertion which I think is what it is actually trying to say. (Similarly for postconditions etc.)
I agree that "could not prove assertion" seems better, like postconditions, I had been thinking about it also. Previously, the message stated "assertion violation" which had proven to be awfully confusing for some users.
Ok I don't follow fully (total noob here) but if I'm understanding you directly the issue is when you have something like:
lemma Foo(input: int)
ensures forall i :: 0 <= i < 10000000000000000 ==> some_property(input, i)
{}
Then it can fail to prove it, and give a counter-example of say input = 1000
, but it can't in general evaluate the assertions because the forall
makes things exponential or possibly even unbounded.
I had a look at the code, and it seems like this bit is where the prover results are translated to Dafny's messages:
ProverInterface.Outcome.Valid => GutterVerificationStatus.Verified,
ProverInterface.Outcome.Invalid => GutterVerificationStatus.Error,
ProverInterface.Outcome.Undetermined => GutterVerificationStatus.Inconclusive,
ProverInterface.Outcome.TimeOut => GutterVerificationStatus.Error,
ProverInterface.Outcome.OutOfMemory => GutterVerificationStatus.Error,
ProverInterface.Outcome.OutOfResource => GutterVerificationStatus.Error,
ProverInterface.Outcome.Bounded => GutterVerificationStatus.Error,
And then here it converts it to a message:
case GutterVerificationStatus.Error:
var failureDescription = description?.FailureDescription ?? "_no message_";
if (currentlyHoveringPostcondition &&
(failureDescription == new PostconditionDescription().FailureDescription ||
failureDescription == new EnsuresDescription().FailureDescription)) {
failureDescription = "This postcondition might not hold on a return path.";
}
return $"{obsolescence}[**Error:**](https://dafny-lang.github.io/dafny/DafnyRef/DafnyRef#sec-verification-debugging) " +
failureDescription;
Where FailureDescription
is e.g.:
public override string FailureDescription =>
customErrMsg ?? "assertion might not hold";
ProverInterface.Outcome
is from Boogie.
Are you saying that none of the ProverInterface.Outcome
values actually mean "this assertion is definitely false"? Even ProverInterface.Outcome.Invalid
?
In which case what is the practical difference between Invalid
and Undetermined
?
Are you saying that none of the ProverInterface.Outcome values actually mean "this assertion is definitely false"? Even ProverInterface.Outcome.Invalid?
When Boogie says invalid, it means that it has a counter-example for the model Dafny sent it. Dafny provides the guarantee that if Boogie says valid, then the Dafny code is valid. I'm not an expert in this domain, @RustanLeino could answer better, But the way Dafny encodes verifications conditions is weaker in Boogie than in Dafny. For example Boogie's counter example includes models for functions that might have a definite implementation in Dafny, it's just that this model does not match the function itself - which was opaque after 2 levels of unrolling.
Once we will get a better debugging experience in VSCode, users would be able to browse the actual counter-example, see where there is a mismatch with Dafny, and potentially give more fuel or write things differently But for now, Invalid and Undetermined are indeed very similar for Dafny, as the counter-examples are not necessarily precise.
Ah ok I think I understand. In that case can I suggest:
GutterVerificationStatus.Inconclusive
GutterVerificationStatus.Error
messages to "could not prove ..."Would you accept a PR for that?
GutterVerificationStatus.Inconclusive
for rendering. Right now, inclusive is rendered as nothing, so we don't see assertions like when they are conditionally verified in the gutter, but this ought to change to my opinion.
Dafny version
3.10.0
Code to produce this issue
Command to run and resulting output
What happened?
I've done some formal verification of SystemVerilog using SVA and Synopsys's VC_Static tool. With that tool you write properties, click "Prove" and then wait. The possible states of a property are:
A counter-example is only available in the Falsified case.
This all makes intuitive sense to me.
Dafny doesn't seem to do it in the same way. Proven is the same, but all other three states result in the error
assertion might not hold
which by any reasonable interpretation means it has been falsified.Even worse, you can press F7 to generate a counter-example even for the inconclusive case where the example actually satisfies the assertion!
Could Dafny not distinguish Falsified and Inconclusive? If not I think the error should at least be reworded to
could not prove assertion
which I think is what it is actually trying to say. (Similarly for postconditions etc.)For the counter-examples it should be trivial to determine if the generated counter-example passes or fails the assertion and give a message like
this example disproves the assertion
orthis example passes the assertion but Dafny cannot reason about it
(or whatever a passing counter-example is supposed to mean).Hope this isn't too negative. My overall experience of Dafny so far has been very positive, mainly because of the excellent VSCode extension and amazing documentation.
What type of operating system are you experiencing the problem on?
Windows