Improving --detail_errors (Used to be: --detail_errors allows proving a contradiction)

s-zanella commented 6 years ago

The proof obligations checked by --detail_errors are not logically equivalent to the original proof obligation. Exploiting this seems to require a match with overlapping cases. In the example below, the cases are not redundant but taken individually they overlap.

$ cat M.fst
module M

#set-options "--detail_errors"
val fail: n:int -> Lemma (n == 0)
let fail = function
  | 0 -> ()
  | _ -> ()

#reset-options
val falso: unit -> Lemma False
let falso _ = fail 1

$ fstar.exe M.fst
Detailed error report follows for M.fst(6,0-9,11)
Taking 5 seconds per proof obligation (3 proofs in total)
3, 2, 1, 
OK: proof obligation at /Users/anonymous/everest/FStar/ulib/prims.fst(187,75-187,87) was proven
OK: proof obligation at <dummy>(0,0-0,0) was proven
OK: proof obligation at M.fst(5,25-5,33) was proven
Verified module: M (1895 milliseconds)
All verification conditions discharged successfully

$ fstar.exe --version
F* 0.9.5.0
platform=Darwin_x86_64
compiler=OCaml 4.06.0
date=2018-04-26T10:59:14+01:00
commit=8280654d7

This was first spotted by @fournet in mitls-fstar Record.fst and independently reported by @jaybosamiya in a self-contained example.

Not sure about this being high priority since --detail_errors is rarely used to work around a failing proof. For instance, find -X everest -name *.fst | xargs grep "detail_errors" * only shows one use in one test file. It does make --detail_errors much less useful.

nikswamy commented 6 years ago

Thanks for the example @s-zanella.

Summarizing my understanding of it. I think the conditional helps trigger the issue here, but it is not fundamental. There's probably a way to trigger it without non-overlapping branches ...if not, it's just a piece of luck.

Starting with the conditional, we produce a query whose shape is as follows:

(forall post res.
          ((res == 0) ==> post res)
           ==>
           (forall x. 
                    ((x == 0) ==> post x)
                /\ ((x <> 0) ==> post x)))

For error reporting, we instrument the query like so:

(forall post res.
          ((l1 \/ res == 0) ==> post res)
           ==>
           (forall x. 
                    ((x == 0) ==> (l2 \/ post x))
                /\ ((x <> 0) ==> (l3 \/ post x))))

where l1, l2, l3 are three uninterpreted boolean terms in the SMT encoding.

On its own, this query is not provable and the program fails to type-check, rightfully so. And Z3 reports back a counterexample that includes an assignment for l1 and l3, which we translate back into an error report for the user.

But, with detail errors on, we present the query repeatedly to Z3, each time with only one of the labels left uninterpreted. For example, one of those queries is given in a context where l1 and l2 are set to true and we try to prove just the formula labeled l3, which is of course trivially provable now.

This was triggerred by a match in the source program, but the overlapping branches really don't have much to do with it---it's just that disabling the ensures clause l1 makes proving the program trivial.

So, --detail_errors is badly broken. : (

Here's a proposed fix:

First, we downgrade the status of --detail_errors. Currently, a program is accepted if the proof succeeds with --detail_errors. This should never be the case. i.e, we should not rely on error reporting heuristics for soundness.

Now, there are a few options:

We retain the current "one label at a time" behavior of detail errors. If one of those labels fails, great, we can report it to the user. If none of them fail, we're out of luck and have to say that we cannot provide a detailed error report. This can be frustrating, particularly as detail errors already can take a very long time on a large proof
A more sophisticated option: We could distinguish between the kinds of sub-goals in a term. For example, never disabling the formulas corresponding to ensures clauses. That might prevent this unsoundness (not sure it is sufficient ... but then again, --detail_errors does not have to be sound). This would help in identifying the branch in which a post-condition was not provable, but not necessarily which conjunct of the post-condition was not proved. Having identified the branch(es), one could then iterate, while trying to disable all but one conjunct at a time in the post-condition trying to find the ones that fail and reporting the pairs of branches + conjuncts. This would be even more time-consuming than it is currently, --detail_errors can take hours already for a large proof.

Thoughts?

Thanks again for the report.

nikswamy commented 6 years ago

FYI, now in my branch and soon in master:

$ fstar ../examples/bug-reports/bug1441.fst
Detailed error report follows for ../examples/bug-reports/bug1441.fst(6,11-8,11)
Taking 5 seconds per proof obligation (4 proofs in total)
4, 3, 2, 1,
OK: proof obligation at ../examples/bug-reports/bug1441.fst(5,25-5,33) was proven in isolation
OK: proof obligation at <dummy>(0,0-0,0) was proven in isolation
OK: proof obligation at D:\workspace\everest\FStar\ulib\prims.fst(187,75-187,87) was proven in isolation
OK: proof obligation at D:\workspace\everest\FStar\ulib\prims.fst(187,75-187,87) was proven in isolation
Failed: the heuristic of trying each proof in isolation failed to identify a precise error
../examples/bug-reports/bug1441.fst(5,25-5,33): (Error 19) could not prove post-condition
../examples/bug-reports/bug1441.fst(6,11-8,11): (Error 19) assertion failed(Also see: D:\workspace\everest\FStar\ulib\prims.fst(212,30-212,42))
Verified module: Bug1441 (2331 milliseconds)
2 errors were reported (see above)

jaybosamiya commented 6 years ago

The fix seems great!

As for option 2 that you mentioned, imho, it seems like a good idea. Since it automates out the human who would've done the exact same thing via manually commenting things in and out, I believe even if --detail_errors takes hours, it still saves time overall for us (because in the worst case, for larger proofs, we'd simply let it run overnight).

nikswamy commented 6 years ago

Removing the unsoundness and high priority tags as this fix is now in master. Adding the error messages tag.

FStarLang / FStar

Improving --detail_errors (Used to be: --detail_errors allows proving a contradiction) #1441