The issig tactics are slow

JasonGross commented 11 years ago

The issig tactics are painfully slow for complicated record types involving three or more projections. I'm pretty sure the reason is that Coq is bad at nested sigma types, because they make the proof tree blow up. For example, by using equiv_adjointify and a lemma that proves (x.1; (x.2.1; x.2.2)) = x, I have a proof that goes through in about 2.5 seconds. (The lemma takes less than 0.1 seconds to prove.) If I don't factor out the lemma, it takes about 9 seconds to prove. If I use the issig tactic, it takes about 61 seconds to compile.

Is there a reason that we're not using equiv_adjointify in the issig tactics?

Either way, we should probably factor out the retraction(?) into separate lemmas about the eta properties of nested sigTs. I can go do this once I have an answer about using equiv_adjointify.

mikeshulman commented 11 years ago

At present we have a policy of avoiding equiv_adjointify whenever possible, because it munges the unit or counit. By avoiding equiv_adjointify, we can ensure that the unit and counit stored as part of the IsEquiv record are the exact ones we specified in the proof. However, I don't know whether this has yet been important, or whether it was just a principled decision. Why would equiv_adjointify make the issig tactics faster, though? Surely it produces an even larger proof tree, no? Is it because part of it is opaque?

JasonGross commented 11 years ago

I'm sure the opaque helps a bit, but the fact that it is folded helps significantly more. What matters is not the size of the normalized proof tree, but the size of the (unnormalized) proof tree. (I'm not sure what normalization Coq actually does; perhaps beta, iota, zeta, but certainly not delta.) Using equiv_adjointify hides more of the proof tree behind a name, and requires less construction in the presence of an already large proof tree.

mikeshulman commented 11 years ago

Surely that doesn't depend on the fact that it's equiv_adjointify we're talking about; couldn't we hide an equal amount of the proof tree behind a name in any case?

JasonGross commented 11 years ago

What's special about equiv_adjointify is that it's type is simpler. I only have to mention the fibration type once or twice, rather than n times. If you prove a similarly general lemma about the coherence of the unit and counit which doesn't depend on what record I'm dealing with, only on how many projections it has (and what they are), perhaps that would be just as good. (Perhaps there's a kind of "generalized sigma type" where we can ask for the builder, the projections, and the proofs of the elim/computation rules, and then prove something about those, and maybe that would be faster. I might try to play with this idea a bit.)

It would be great if we could replace the issig tactics with issig lemmas, which quantify over records, but Coq doesn't give us any syntax to do so. Using let ... in ... doesn't help at all. Using abstract, which I think is closest to hiding something behind a name (well, an opaque one), doesn't seem to help in this case. If you want to play with the case I'm working with, it's at https://github.com/JasonGross/HoTT-categories/blob/lax-comma-partial/theories/Comma/LaxCommaCategory.v#L130. (If my git submodule setup, or anything else, is giving you trouble, let me know.)

I don't fully understand the internals of Coq, and all of my knowledge of what is fast and slow comes from the experiences I've had with slow proofs. These are the lessons I've drawn: (I can try to elaborate on any particular one of them if you want)

Things tend to start getting slow at around 100 000 --- 1 000 000 terms in the goal (when I say a goal has n terms, I mean that if I Set Printing All., and run the output through a wc -w, the number outputted is n).
If admit is slow, then you almost certainly need to back up; no matter what you do, you're probably not going to be faster than admit. (I hypothesize that the slowness is either due to the size of the proof-tree so far, of the size of the type of the goal; my intuition suggests the latter is more probable than the former)
Using abstract can help, especially when Defined or Qed is slow; I think that it combines the "does this full proof pass the kernel checker?" check with some ad-hoc type-checking done during proof generation. I hear that abstract can help immensely with space/RAM issues, but I've never experienced this. Sometimes abstract makes things slower for me, though you should always compare the build + Defined/Qed time, and not just the time to build the proof, when looking at how abstract changes things.
The order of operations matters, especially when destructing things. There's a difference between applying a lemma to a projected component of a sigma type, versus destructing the sigma type and applying the lemma to the resulting component. Very often, I've found that destructing sigma/record types is slow, but applying generic lemmas (such as f_equal or ap) to projected components is even slower, and so it helps to destruct my record types before applying lemmas. On the other hand, if a lemma takes the sigma type whole, it's often faster to not destruct it.
Destructing sigma types (especially nested ones) is very slow. Destructing records is sometimes significantly faster. (Compare the times of the issect and isretr in the LaxCommaCategory.v file I linked to above. The one which destructs the record is at least a few order of magnitudes faster.) The problem gets worse the more complicated the fibration (the P in the type sigT P) of the sigma type is. I suspect this is because destructing records gets coded as a single match without needing many type annotations, where destructing nested sigmas gets coded as nested matches with many type annotations, and the size of the proof tree gets very large very quickly.
The only time Coq does delta reduction (unfolding) seems to be when checking convertibility. That means that previously defined lemmas which show up in a proof tree don't, themselves, significantly add to the size of the proof tree, nor (often) the time (unless the proof requires that they be convertible with, e.g., idpath), regardless of how big the lemma is when unfolded.
Combining the previous with the fact that abstract often doesn't help much suggests that the size of the type of the goal becomes important long before the size of the existing proof tree becomes important. (However, I tend to have experience with wide proof trees (where the type of each level is a very large term), rather than with deep ones; abstract is probably more useful for deep proof trees.)

mikeshulman commented 11 years ago

Apparently we have a long way to go before proof assistants are really practical for mathematics. This is the sort of thing that no user should have to think about. However, since we have to work with what we've got, and I don't think we actually have a need for specified units and counits yet, I'm fine with adjointifying in issig for now. Maybe the non-adjointifying versions could be left commented out, with comments explaining to a reader of the code that we're only adjointifying for speed reasons.

JasonGross commented 11 years ago

I agree with you that it would be nice if no-one had to think about this. I'm not sure what constitutes "really practical", and am ok with using proof assistants for, e.g., category theory as-is (though of course I have a long wish-list, with better speed fairly high up). Anyway, I'm about to make the pull request; as a side-effect, there's no more Ltac black magic in the tactics, which are much, much shorter, in addition to being faster.

mikeshulman commented 11 years ago

Yeah, a lot of the work that went into writing those tactics was due to the desire to avoid adjointifying...

HoTT / Coq-HoTT

The issig tactics are slow #157