Don't use vmcast to convert the reified goal and actual goal

SkySkimmer commented 4 months ago

VM can't avoid computing subterms of the goal that should be left alone.

Fix #133

SkySkimmer commented 4 months ago

With this PR:

From Coq Require ZArith.
From AAC_tactics Require Import AAC.
From AAC_tactics Require Instances.
Import ZArith.
Import Instances.Z.

Fixpoint mkevil n := match n with 0%nat => 1 | S k => 2 * mkevil k end.
(* 2 ^ n in Z *)

Definition evil := mkevil 1000%nat.

(* Time Eval vm_compute in evil. *)
(* 0.08s *)

Lemma test a : a + evil = evil + a.
Proof.
  Time aac_reflexivity.
  (* 0.9s *)

Time Qed.
(* 0.01s *)

, without it Qed takes an unknow but very long time (evil := mkevil 25 takes half a minute and evil := mkevil 20 takes 1s so evil := mkevil 1000 probably dies to cosmic ray induced bitflips before proper termination).

MSoegtropIMC commented 4 months ago

@SkySkimmer : FYI: I am working on a solution, where also the tactic application time is quite fast (a few ms) - but it is a more invasive change.

MSoegtropIMC commented 4 months ago

@SkySkimmer : I had a look at the slowness in your example and is type class resolution:

[1708966411.941922] Debug: 1: looking for (@Proper (@Internal.Sym.type_of Z 0) (@Internal.Sym.rel_of Z (@eq Z) 0) evil) without backtracking
[1708966413.308986] Debug: 1.1: (*external*) proper_reflexive on (@Proper (@Internal.Sym.type_of Z 0) (@Internal.Sym.rel_of Z (@eq Z) 0) evil), 0 subgoal(s)

Adding

#[export] Instance evil_proper : Proper eq evil.
Proof.
 typeclasses eauto.
Qed.

fixes this.

It still would make sense to do a full compute in the Eq = Eq premise, but at least inside of proofs (not inside of deep automation) it should be OK as is. The difference between vm_compute and reflexivity to Eq is probably negligible compared to type class resolution in most cases.

I profiled the two example files without and with your change and there is no difference:

Original
coqc Tutorial.v  1.27s user 0.20s system 93% cpu 1.573 total
coqc Caveats.v  0.74s user 0.19s system 98% cpu 0.934 total

VMCast -> DEFAULTcast
coqc Tutorial.v  1.27s user 0.20s system 99% cpu 1.479 total
coqc Caveats.v  0.74s user 0.19s system 98% cpu 0.934 total

@palmskog : IMHO the fix of @SkySkimmer is OK, except that the other occurance of VMcast should also be replaced. The above tests are with both replaced.

MSoegtropIMC commented 4 months ago

I now did a performance test with larger terms. The short summary is that one doesn't need to worry much because AAC tactics apparently has rather bad run time behaviour. It seems to rise exponentially with base 3 with variable / operator count.

The tactic run time is about the same for master and this PR (with both VMcast changed). The QED time is faster in master. The solution in this branch takes about the same time for tactic run time and QED, which I find reasonable to prevent uncontrolled blow up in other places. And as I said - for the examples provided there is no difference.

Here is the test:

From Coq Require PeanoNat ZArith List Permutation Lia.
From AAC_tactics Require Import AAC.
From AAC_tactics Require Instances.

Section Performance.
  Import ZArith.
  Import Instances.Z.

Variable v1 : Z.
Variable v2 : Z.
Variable v3 : Z.
Variable v4 : Z.
Variable v5 : Z.
Variable v6 : Z.
Variable v7 : Z.
Variable v8 : Z.
Variable v9 : Z.
Variable v10 : Z.

Example Var10:
  v1+v2+v3+v4+v5+v6+v7+v8+v9+v10
  =
  v10+v9+v8+v7+v6+v5+v4+v3+v2+v1
  .
  Time aac_reflexivity.
Time Qed.

Variable v11 : Z.

Example Var11:
  v1+v2+v3+v4+v5+v6+v7+v8+v9+v10+v11
  =
  v11+v10+v9+v8+v7+v6+v5+v4+v3+v2+v1
  .
Proof.
  Time aac_reflexivity.
Time Qed.

Variable v12 : Z.

Example Var12:
  v1+v2+v3+v4+v5+v6+v7+v8+v9+v10+v11+v12
  =
  v12+v11+v10+v9+v8+v7+v6+v5+v4+v3+v2+v1
  .
Proof.
  Time aac_reflexivity.
Time Qed.

Variable v13 : Z.

Example Var13:
  v1+v2+v3+v4+v5+v6+v7+v8+v9+v10+v11+v12+v13
  =
  v13+v12+v11+v10+v9+v8+v7+v6+v5+v4+v3+v2+v1
  .
Proof.
  Time aac_reflexivity.
Time Qed.

Variable v14 : Z.

Example Var14:
  v1+v2+v3+v4+v5+v6+v7+v8+v9+v10+v11+v12+v13+v14
  =
  v14+v13+v12+v11+v10+v9+v8+v7+v6+v5+v4+v3+v2+v1
  .
Proof.
  Time aac_reflexivity.
Time Qed.

Variable v15 : Z.

Example Var15:
  v1+v2+v3+v4+v5+v6+v7+v8+v9+v10+v11+v12+v13+v14+v15
  =
  v15+v14+v13+v12+v11+v10+v9+v8+v7+v6+v5+v4+v3+v2+v1
  .
  Time aac_reflexivity.
Time Qed.

End Performance.

(*

master
Finished transaction in 0.363 secs (0.111u,0.016s) (successful)
Finished transaction in 0.045 secs (0.043u,0.s) (successful)
Finished transaction in 0.928 secs (0.287u,0.005s) (successful)
Finished transaction in 0.129 secs (0.127u,0.s) (successful)
Finished transaction in 2.583 secs (0.748u,0.013s) (successful)
Finished transaction in 0.332 secs (0.328u,0.001s) (successful)
Finished transaction in 7.624 secs (2.202u,0.044s) (successful)
Finished transaction in 0.957 secs (0.95u,0.004s) (successful)
Finished transaction in 22.646 secs (6.454u,0.149s) (successful)
Finished transaction in 2.81 secs (2.801u,0.008s) (successful)
Finished transaction in 67.608 secs (19.106u,0.423s) (successful)
Finished transaction in 8.367 secs (8.34u,0.025s) (successful)

Gaetan
Finished transaction in 0.361 secs (0.11u,0.015s) (successful)
Finished transaction in 0.313 secs (0.111u,0.002s) (successful)
Finished transaction in 0.931 secs (0.291u,0.005s) (successful)
Finished transaction in 0.873 secs (0.276u,0.004s) (successful)
Finished transaction in 2.653 secs (0.812u,0.016s) (successful)
Finished transaction in 2.483 secs (0.69u,0.015s) (successful)
Finished transaction in 7.861 secs (2.391u,0.05s) (successful)
Finished transaction in 7.364 secs (1.993u,0.044s) (successful)
Finished transaction in 23.011 secs (6.778u,0.156s) (successful)
Finished transaction in 22.456 secs (6.324u,0.135s) (successful)
Finished transaction in 69.07 secs (20.594u,0.421s) (successful)
Finished transaction in 70.027 secs (20.953u,0.491s) (successful)
*)

palmskog commented 4 months ago

@MSoegtropIMC so what is the verdict or bottom line? Will this PR improve current master from your perspective? And do you plan to do a PR with more replacements with DEFAULTcast?

MSoegtropIMC commented 4 months ago

@palmskog : IMHO this PR should be merged (with the other cast changed as well). It makes complicated use cases 2x slower but some simple use cases close to infinitely faster. IMHO this is a good compromise since apparently this tactic is anyway not terribly useful for complicated cases (ring does this in a flash).

@SkySkimmer : can you please change the other VMcast as well? I did all of my tests with both changed (there are only two).

I have ideas which might make this faster (both tactic execution and QED time), but I think we should merge this PR anyway.

SkySkimmer commented 4 months ago

@SkySkimmer : can you please change the other VMcast as well? I did all of my tests with both changed (there are only two).

changed

palmskog commented 4 months ago

@MSoegtropIMC it would be useful to have all your performance tests available in the repository. Any chance you could submit them as a PR? For example, you could put them under a directory tests in the root, and then I can add boilerplate to build them optionally with Dune (when called with runtest).

MSoegtropIMC commented 4 months ago

@palmskog : sure, I planned to do this - still wrapping my head around making it faster. In case this doesn't work out, I will do a PR just for the performance tests.

Btw.: do you have the impression that there is a robust test suite? I have the impression that there is quite a bit of not neccessary code - I can remove > 50% of the code of some tactics and they still do the same. I am not sure if this is to hack around issues in old versions of Coq which have been fixed meanwhile or for exotic uses cases.

palmskog commented 4 months ago

@MSoegtropIMC I've only worked with some small parts of the OCaml code, there could indeed be many gains by refactoring. Most of the "tests" are right now at the Coq level (Tutorial.v, Caveats.v), which could be incomplete. There are some downstream projects that use AAC Tactics which might be useful for testing as well, https://github.com/damien-pous/relation-algebra comes to mind.

coq-community / aac-tactics

Don't use vmcast to convert the reified goal and actual goal #134