dafny-lang / dafny

Dafny is a verification-aware programming language
https://dafny.org
Other
2.89k stars 259 forks source link

noisy error messages can hide the "root cause" precondition error #370

Open tjhance opened 5 years ago

tjhance commented 5 years ago

Consider this code,

datatype T = T(x: int)
datatype S = S(u: int, v: int, w: int, x: int, y: int, z: int)

predicate a(t: T)
predicate b(t: T)
predicate c(t: T)
predicate d(t: T)
predicate e(t: T)
predicate f(t: T)
predicate g(t: T)

predicate WellFormed(t: T) {
  && a(t)
}

function Func(t: T) : S
requires WellFormed(t)
{
  S(t.x, t.x, t.x, t.x, t.x, t.x)
}

predicate Good(s: S) {
  && s.u == 5
  && s.v == 5
  && s.w == 5
  && s.x == 5
  && s.y == 5
  && s.z == 5
}

function {:opaque} GetT() : T {
  T(5)
}

lemma foo()
  ensures var t := GetT();
    && WellFormed(t)
    && Good(Func(t))
{
  reveal_GetT();
}

It gives this error message,

Dafny 2.1.1.10209
a.dfy(39,0): Error BP5003: A postcondition might not hold on this return path.
a.dfy(37,7): Related location: This is the postcondition that might not hold.
a.dfy(13,5): Related location
Execution trace:
    (0,0): anon0
a.dfy(39,0): Error BP5003: A postcondition might not hold on this return path.
a.dfy(38,7): Related location: This is the postcondition that might not hold.
a.dfy(23,9): Related location
Execution trace:
    (0,0): anon0
a.dfy(39,0): Error BP5003: A postcondition might not hold on this return path.
a.dfy(38,7): Related location: This is the postcondition that might not hold.
a.dfy(24,9): Related location
Execution trace:
    (0,0): anon0
a.dfy(39,0): Error BP5003: A postcondition might not hold on this return path.
a.dfy(38,7): Related location: This is the postcondition that might not hold.
a.dfy(25,9): Related location
Execution trace:
    (0,0): anon0
a.dfy(39,0): Error BP5003: A postcondition might not hold on this return path.
a.dfy(38,7): Related location: This is the postcondition that might not hold.
a.dfy(26,9): Related location
Execution trace:
    (0,0): anon0

Dafny program verifier finished with 3 verified, 5 errors

It's a little noisy, because only the first error message is relevant: this error message complains that it can't prove WellFormed(GetT()), but if the precondition WellFormed(GetT()) is added, then all error messages go away. The only reason any of the other errors occur is because of a precondition violation in the expression Good(Func(t)), but there is no error message about preconditions at all. So it seems to me that the problem is that Dafny doesn't assume the first conjunct WellFormed(t) when it tries to prove Good(Func(t)).

This example is only a little suboptimal, but I have a much larger example where the first error message did not appear at all, giving NO indication that the equivalent of the line WellFormed(GetT()) was a problem. The result was that it was pretty difficult to figure out what the problem was. (Unfortunately, that example depends on thousands of lines of code and I wasn't able to produce a minimalist version.)

davidcok commented 4 years ago

It is a general challenge to proof developers to discover why a proof fails. When a postcondition fails, the problem can be a missing precondition as in this case, but it could also be missing proof connections along the way to the conclusion. In this case the multiple error messages point out different paths that might lead to the failed proof conclusion. Dafny does not try to propose a fix to failed proofs (In fact proof correction, at least in the face of code refactoring or evolution, is a PhD-worthy research topic). So a future Dafny might give proof correction tips, but it is definitely beyond the current scope.

davidcok commented 4 years ago

Adding the example code as a test case to alert us to future changes in behavior

tjhance commented 4 years ago

Hi, thanks for taking a look at my report. Reading my report again now, I think it was very confusing, so I want to try to clarify why I posted this report and specifically what problem I think may be addressable.

The problem, specifically, was that both WellFormed(t) and Good(Func(t)) would fail to verify, but in some cases, Dafny would not even report an error for the first one.

As a user, if I ask Dafny to verify a && b, and it comes back and says it couldn't verify b, while saying nothing about a, but the actual problem was that it couldn't verify a, I'm going to be very confused. Dafny obviously has to try to verify a, and that's going to fail, so why would I not get a report about a?

Again, I realize that the example I actually posted was not particularly indicative of the actual problem I was trying to convey - see my last paragraph of the original report. So maybe I haven't actually given you anything to work with, but I did want to clarify what I was going for.

davidcok commented 4 years ago

Thanks for the clarification. Rustan and I looked into the handling of a && b and think that there might be an improvement that would improve error messages.

Note one matter: there is a default limit of 5 error messages per method. So it could be that a message that would be more helpful is dropped because other messages are listed first (and the order is somewhat non-deterministic).

Leaving this issue open to reconsider the desugaring of short-circuit operations.