dafny-lang / dafny

Dafny is a verification-aware programming language
https://dafny.org
Other
2.94k stars 263 forks source link

Resource Limit hides subsequent assertion failures within a method #5805

Open TomSMaier opened 1 month ago

TomSMaier commented 1 month ago

Dafny version

4.8.1

Code to produce this issue

// dafny verify input.dfy

// with tight resources: "subsequent" assertions failures are "hidden"
method {:rlimit 300} F(x: int, y: int) {
  assert x != 1;
  assert y*y + 10 != x * x;
  assert x != 3;
}

// unlimited resources: all failing assertions found
method F1(x: int, y: int) {
  assert x != 1;
  assert y*y + 10 != x * x;
  assert x != 3;
}

// low limit: proper out-of-resources error
method {:rlimit 5} F2(x: int, y: int) {
  assert x != 1;
  assert y*y + 10 != x * x;
  assert x != 3;
}

Command to run and resulting output

$ dafny verify input.dfy
input.dfy(6,9): Error: assertion might not hold
  |
6 |   assert y*y + 10 != x * x;
  |          ^^^^^^^^^^^^^^^^^

input.dfy(12,9): Error: assertion might not hold
   |
12 |   assert x != 1;
   |          ^^^^^^

input.dfy(13,9): Error: assertion might not hold
   |
13 |   assert y*y + 10 != x * x;
   |          ^^^^^^^^^^^^^^^^^

input.dfy(14,9): Error: assertion might not hold
   |
14 |   assert x != 3;
   |          ^^^^^^

input.dfy(18,19): Error: Verification out of resource (F2)
   |
18 | method {:rlimit 5} F2(x: int, y: int) {
   |                    ^^

Dafny program verifier finished with 0 verified, 4 errors, 1 out of resource

What happened?

The example contains 3 methods with identical bodies. Dafny gives different results based on the provided resource limit. I was surprised by dafny's behavior on the first method.

The same happens in VSCode, see screenshot.

Using timeouts, the same behavior can be observed, but less repeatable.

Discussed with @RustanLeino in-person a few weeks ago.

vscode-screenshot

What type of operating system are you experiencing the problem on?

Mac

keyboardDrummer commented 1 month ago

Indeed the behavior is surprising and I consider it a bug as well.

Marking this with a brittleness label since we may add a default resource limit to combat brittleness, which will make this bug more apparent.