HolSmt: add support for `num` type, fix proof replay, build smtheap

Hi! This PR is a bit larger than my previous ones. No code outside HolSmt was changed. As for HolSmt, this PR contains the following changes:

Add (some) support for the `num` type to HolSmt

The current approach is very simple. It consists in doing the following:

Before translating a goal to the SMT-LIB language, we modify the goal by converting all num literals into integer literals, so that these can be translated by the already-implemented integer support. In other words, literals such as 0n and 2n are converted into Num 0i and Num 2i. Since 0i and 2i are integer literals, these will be translated into actual numbers instead of uninterpreted constants. This conversion avoids having to do anything special for parsing, printing or for proof reconstruction, as integer literals are already supported and Num will just be a function like any other from the SMT prover's perspective (that maps integers into natural numbers).
Furthermore, we add to the goal as assumptions some existing theorems which allow SMT solvers to convert num operations into integer operations which they natively support and can reason about easily (see commit a49c84e09a1b45aca5acf9984fcecda75b127f22 for details).

This is really all that's needed to add some initial support for nums, although I'm sure more theorems could be added and/or optimizations could be performed later if necessary. It allows SMT solvers to solve all of the existing num self-tests, except for the DIV and MOD-related ones, because integer div and mod are not supported yet (this will be fixed in the next PR).

Note that there is a significant regression: by adding the num-related theorems as assumptions, SMT solvers cannot come up with sat results anymore. I've narrowed this down to a single theorem, integerTheory.INT, which is needed for SMT solvers to reason about SUC. My guess is that they are simply not able to come up with models for the & and SUC functions which satisfy the restrictions imposed by the theorem.

Since the self-tests rely on sat results to detect unprovable goals, and since a user might want to avoid theorems from being added (for performance or debugging reasons, perhaps), I've added a tunable (HolSmtLib.include_theorems := false) which can be used as an escape hatch to prevent theorems from being automatically added. However, I don't expect this to be something that one would normally use.

Z3 proof parser fixes

This was by far the most challenging part of the PR. The num support added previously led to Z3 creating more complicated proof certificates that we hadn't observed before. Specifically, Z3 is now mixing and adding proofterms within regular SMT-LIB terms and is nesting proof-specific let terms inside the bindings of regular SMT-LIB terms as well. Furthermore, Z3 proof certificates are a long chain of (many thousands of) let terms, so special care is required to parse these to avoid causing stack overflows.

I considered about 5 different ways of fixing this, tried and abandoned implementing 2 of them until I finally reached the current approach, which I think is by far the simplest one, since it avoids duplicating code and is relatively simple. The current approach consists in parametrizing the SMT-LIB term parser by a couple of let handler functions, which allows the parser to behave differently depending on whether we're parsing regular SMT-LIB terms or SMT-LIB terms inside Z3 proof certificates.

Another issue is that the indices in indexed identifiers could only be numerals but from SMT-LIB 2.5 forward they can also be symbols. This was fixed in a commit prior to this PR (8a2e9bc97c2955bdbda0bceb90948e87a92f5c22). However, Z3 proof certificates can actually contain full SMT-LIB terms as indices (as part of quant-inst inference rules), so we now parse them as a list of Term.term instead of as a list of strings.

Z3 proof replay fixes

This consisted in the following:

Adding new proof rule handlers for the nnf-neg, nnf-pos, mp~, quant-inst, proof-bind, refl and sk proof rules and the lambda binder, which we didn't have support for until now. The nnf-neg and nnf-pos handlers just invoke METIS_TAC for now, and proof-bind seems to be a no-op, while the other ones were implemented with specialized handlers.
Fixed the already-implemented quant-intro and elim-unused proof rule handlers to also handle exists quantifiers (besides forall quantifiers which they already handled).
Implemented a workaround for HOL4 issue HOL-Theorem-Prover/HOL#1203, which simply consists in using COOPER_TAC instead of ARITH_TAC until the issue is fixed.
Implemented a workaround for Z3 issue Z3Prover/z3#7154, which has now been fixed in Z3 but no release contains the fix yet. The workaround simply consists in using METIS_TAC to bridge the gap in the mismatched terms when the theorem doesn't come out as expected.
I've also fixed an issue in Z3 (Z3Prover/z3#7157) and implemented a HolSmt fix and workaround for that Z3 issue. The issue was that Z3's AST pretty-printer would change some terms when printing them in proof certificates (Z3 calls this the simplify_implies feature), which would cause these terms in the proof output not to match the terms in the input file that was provided to Z3. This issue had been causing proof replay failures in HolSmt for more than 13 years now (see the comment added at the bottom of commit 522a851b6671b54a4d93d6c91b22490802a2d6d7). The workaround consists in using METIS_TAC and Drule.PROVE_HYP at the end of the proof replay procedure to remove each hypothesis that doesn't exactly match one of the assumptions. When using a fixed version of Z3, the workaround shouldn't do anything (unless there's another bug) since the hypotheses will match the assumptions.
Added four new rewrite theorems which came up in some of the test cases. This is the part that I'm least happy about in terms of proof replay (since it seems there's no way to tell whether we've covered all cases), but I now have a plan to tackle these issues in a more principled way (mentioned further below).

These fixes above allow us to replay the Z3 proofs of the quantifier tests, the double implication tests (exercising one of the Z3 bugs mentioned earlier) and all the arithmetic test cases in the HolSmt test suite, except for the word ones and the div and mod ones (which we don't support yet). I've actually implemented support for div and mod already, but proof replay will require more fixes and this PR was already getting too long so I will leave that for the next one.

Note that proof replay is still quite brittle, for a few reasons:

Unimplemented proof rules -- there are still a few Z3 proof rules that aren't implemented, although I don't expect this to be a significant issue going forward, as they are very few and I don't foresee this to be a significant challenge.
The existing proof rule handlers not covering all cases: this is mostly due to the fact that 1) when HolSmt was originally developed, Z3 was closed source so it wasn't easy to tell what the proof handlers should do, and 2) Z3 has evolved significantly for the past decade which means that some proof rules have been expanded in terms of their scope. The fixes for these issues are usually pretty straightforward, though.
Rewrite rules: these are a lot more challenging since they are very ad-hoc, undocumented and currently HolSmt does its best to handle them but it's hard or impossible to tell which are the actual rewrite rules that need to be handled. But again, I do have a plan to tackle this issue (mentioned further below).
Missing or hard-to-implement features: e.g. word-related proofs seem to need additional special handling which we don't have support for yet. Polymorphic functions are also causing issues which we can't handle yet. Furthermore, it's hard to tell whether proofs of nonlinear arithmetic will ever be possible to handle in a reasonable way -- this would probably require a lot of research and development to fix. And I suspect there will also be other hidden issues where Z3 will not give us enough information in the proof certificate to replay the proof, but I don't have solid examples of this yet.

Added an `smtheap`

This is a significant usability improvement (especially while developing), as it reduces the time loading "HolSmtLib" from ~42s to ~5s (on my laptop).

Plans for the immediate future

As mentioned above, I'm already implementing div and mod for integers and numerals, which is already mostly done (I just need to identify the relevant num theorems and do more testing). Enabling proof replay for these tests will probably require some significant additional fixes (from what I could tell at a glance), which will require some time. I'll probably send a PR when this is done.
I've been thinking of implementing a more principled approach for rewrite proof rules, since these are one of the most problematic to handle.
- The idea is to identify all instances of where mk_rewrite is used in the Z3 source code (currently ~55 instances) and assign a unique integer to each one, which would be printed out in the proof certificate as an index in the rewrite proofterm (e.g. ((_ rewrite 23) (= P Q)) instead of just (rewrite (= P Q)).
- This would allow us to identify exactly which instance of mk_rewrite Z3 used in a given proof certificate, which would allow us to quickly and easily identify the place in the Z3 source code where these rewrite terms are being created, which in turn would allow us to more easily identify the shape of rewrite rules that need to be handled by that instance of mk_rewrite. It would make it a lot easier to verify that we're handling all the necessary cases.
- This would also allow us to improve performance of rewrite rule replay (for the dynamically-created rewrites, not the static ones which a term net can handle easily) since we could now jump directly to a specialized handler for a given rewrite rule (e.g. rewrite rule 23, rewrite rule 15, etc) instead of throwing everything at the wall until something finally works, which is mostly what we do currently in the rewrite handler.
- Implementing this in Z3 should be quite easy, and indeed I'm planning to submit a PR to Z3 soon. Parsing these unique identifiers in HolSmt should be trivial too, but the actual work to identify the shape of rewrite rules and to handle them correctly is a lot of work, so my plan would be to fix that incrementally as issues come up.
If everything above goes well, next I'd probably like to tackle the issue with polymorphic functions or the word test cases, whichever issue is easiest (although I think both will be very hard).

Thanks!

cc @tjark

HOL-Theorem-Prover / HOL