Question: Proof Checking in Z3 - state of the art

unboxedtype commented 4 years ago

Hello, Mr. Bjørner!

I would like to be able to re-check proof objects produced by Z3 proof generation facility (for UNSAT) What options do I have? I see some proof-checking facility is available in Z3 source code: ./z3/src/ast/proofs/proof_checker.* , but not sure how complete/reliable it is at the moment.

There is SMTCoq project that is trying to build a link betweem an SMT solver and an interactive theorem prover, but they do not support Z3, I guess they have a good reason not to and wonder what it is.

Thanks, Appreciate any comments!

rainoftime commented 4 years ago

Hi, you may refer to this issue: https://github.com/Z3Prover/z3/issues/4226

unboxedtype commented 4 years ago

Hi, you may refer to this issue: #4226

Thanks! Yeah, this is very related. I wonder if something has changed since then?

NikolajBjorner commented 4 years ago

The SAT solver takes dimacs input and can produce well-formed DRAT using the sat.drat= option.
The SMT mode has proof tree generation. Proof objects are a combination of rewrite steps and deductions. Mapping these objects directly into a different calculus is not recommended. Isabelle used these proof objects to produce tactics that could replay steps. The proof objects also provide some information about dependencies, which axioms were used.
The SMT core has also a clausal proof mode. It is similar to DRAT (it is just DRUP). The clausal proofs are less granular than the proof trees. You would have to replay learned clauses. Unlike pure SAT, clauses are derived using theory inferences so straight-forward unit propagation is not sufficient. A replay mode is therefore also the way to go. The SMT solver generally doesn't produce many conflict clauses so the overhead of replaying steps might be more manageable because you can still take advantage of a checker that performs unit and theory-unit propagation. The clausal proof objects have yet to be consumed by any tool, other than for inspection (they give some idea of how search could digress).

Overall, I would say that an approach that has a better chance of surviving internal changes is to have the proof checker replay proofs based on hints. Clausal proofs provide a very reasonable interface level (though, they don't consider pre-processing, so one has to recover justifications from pre-processing using sequence of rewrite steps that the normal proof objects provide). Internally, we also benefit from self-validating theory solvers by replaying their theory lemmas. Even though, this has its own limitations (it can miss bugs because the self-validator uses the same buggy code), it has been very valuable. In some cases the theory lemmas turn out to be really difficult to prove because they were generated from methods, such as cuts.

unboxedtype commented 4 years ago

Thank you for the thorough response! I feel this has to be saved permanently somewhere for future generation researchers :-)

NikolajBjorner commented 4 years ago

This is indeed a topic better suited for a discussion forum and I would have put it under a "GitHub discussion" tab if it were already available. It seems still not yet available in beta. So for now, the issue list.

The question is also timely because I am in the process of augmenting Z3's SAT core with theories and then migrating piecemeal SMT services to it. The time seems ripe to take the learnings from the core we have used through many years and integrate newer advances from SAT solving and inprocessing. The SAT solver already handles Pseudo-Booleans and cardinality constraints through theory propagation and options for resolution and in-processing, but it may as well also be augmented with other theory reasoning. As of last week, the current state is to handle simple EUF reasoning and the plan is to allow it to handle a base level of bit-vector reasoning next. I would therefore like to put the correctness and diagnosability of this solver on some firmer ground. The learnings from the legacy solver core is that the proof-tree generation is inherently brittle. The clausal proofs in the legacy solver is an after-thought, but can claim innocence by being open ended in nature.

For the new core I would like to augment the DRAT format with SMT information. A portable format should have wider benefits because independent checkers can consume it and maybe even work across front-ends. DRAT has already been established to be portable, but with the benefit of serving a very confined calculus. SMT is open ended. There are some exciting new developments on extending DRAT to polynomial rings, PR proofs, etc. These would still serve a Boolean domain and for SMT a way to integrate both efficient certification of the Boolean domain and extensibility seems in place.

For EUF I currently generate the following output:

    Input assertion (trusted modulo internalizer):
      c a <literal>* 0

    Bridge from ast-node to boolean variable:
      c b <bool-var-id> := <ast-node-id> 0

    Definition of an ast node:
      c n <ast-node-id> := <name> <ast-node-id>* 0

    Theory lemma
      c <theory-id> <literal>* 0

    Available theories are:
      - euf   The theory lemma should be a consequence of congruence closure.

The format is preliminary, but a start for EUF and an indicator of the main extensions (other extensions will be a way to pass pragmas to ease checking, handle function identifiers that are not just strings, etc). In the current form, the theory lemmas are always used immediately in the next clause, so they can be garbage collected by a checker. The checker for EUF would have to rebuild the AST (abstract syntax tree) nodes for the terms and then check each EUF claim. These claims are small congruence closure proofs. Building a trusted congruence closure checker is in principle a solved problem, while integrating it with a reusable checker could be a very fun area of opportunity.

I would therefore be thrilled interacting with an independent effort in developing such a checker.

For DRAT there have been checkers developed in C, ACL2, Isabelle and Coq. With the runtime advances and already established solving in Lean, I suspect it would be a very appealing platform for the extension of DRAT. A checker, even developed in C, and even using Z3, for checking theory axioms will offer useful diagnostics for development (self-checkers do miss unsoundness bugs in some (fewer) cases, though).

Several benefits from independent checker for DRAT should carry over:

The checker can be used to trim proofs to track premises.
In extent it can be used for provenance information (core extraction without answer literals)
For DRUP format it is somewhat straight-forward to extract interpolants. The corresponding extensions to EUF and other theories can be explored. An appeal could be that one can decouple the internal of decision procedures from advances in clausal interpolation extraction.
Logs can be mined for statistical information about search behavior. For DRAT Soos, Meel, and separately Selsam and I used logs to learn and replay clause garbage collection and branch choices.

The format is also set up to subsume the way we trace proofs in the TRACE option. The axiom profiler tools were developed based on low level tracing of generated terms and instantiated clauses during search. It's format is essentially isomorphic to this proposal (it logs also quantifiers, but that is a conservative extension). Supporting profiling would also require agmenting with pragmas, and perhaps play into the requriements for #4619 (@rospoly @Nils-Becker)

rainoftime commented 4 years ago

"I am in the process of augmenting Z3's SAT core with theories and then migrating piecemeal SMT services to it."

@NikolajBjorner This is exciting (I noticed a new folder src\sat\smt)! I was confusing previously that Z3's lazy SMT engine does not use its state-of-the-art SAT solver in src\sat. The existing smt tactic is very powerful, but I believe Z3's users will greatly benefit from migrating some services to the new one. As an example, the SAT solver has implemented the CHB branching heuristic, which I found to be very effective in solving many hard instances from symbolic execution. I really appreciate the genius effort in integrating newer advances from SAT solving community. :) Personally, I would expect the port of the lazy solving engine for QF_BV and QF_ABV. I experienced some performance issues in the existing incremental engine.

NikolajBjorner commented 4 years ago

Correct, QF_BV and QF_UFBV and QF_ABV are important theories to support well with a good SAT solver. UF is on the path to A.

rainoftime commented 3 years ago

Hi, @unboxedtype You may discuss (and find some new features of Z3) here https://github.com/Z3Prover/z3/discussions/4881

Z3Prover / z3

Question: Proof Checking in Z3 - state of the art #4653