Closed JasonGross closed 1 year ago
I was looking for this feature too!
I wonder if it would make more sense to embed something closer to OCamlProf rather than adapting the Ltac2 profiler, given that Ltac2 is closer to OCaml than to Ltac in some sense. On the other hand, it might be enough to add, roughly, a hook to the beginning and end of the interpretation of each top-level Ltac2 definition.
As a note for anyone (including my future self in a few months from now :wink:) whose search for an Ltac2 profiler might end here: As long as we don't have one, wrapping some potentially slow calls inside Control.time
can give some insights into timing.
Presumably this could be implemented quite similarly to the Ltac1 profiler. @ppedrot Do you think it would work add a do_profile
wrapper to https://github.com/coq/coq/blob/7f2e4bfc0ec0e2e6ff24a91e9060933e06c31d46/plugins/ltac2/tac2interp.ml#L38-L45 from https://github.com/coq/coq/blob/7f2e4bfc0ec0e2e6ff24a91e9060933e06c31d46/plugins/ltac/profile_ltac.ml#L352 a la https://github.com/coq/coq/blob/7f2e4bfc0ec0e2e6ff24a91e9060933e06c31d46/plugins/ltac/tacinterp.ml#L1128-L1132?
Presumably the LtacProf infrastructure could be generalized over the notion of call stacks being used, and could just take a printer for the elements of the call stack?
@jfehrle Do you have time and interest in implementing this feature?
I'm open to working on it, but not right away. There are other things I want to complete first. And I wanted to get back to finishing the debugger, e.g. adding Ltac2 support but the review process is so tedious I have mixed feelings about that. The same consideration applies to this request. I wonder if we can get someone to commit to reviewing the change promptly if I agree take it on.
Assuming nothing else comes up in my life, I'm happy to review this change promptly (and also be the assignee). I think we might want to get @ppedrot or someone else with vision for Ltac2 to agree to review the interface between the profiler and Ltac2, possibly.
A good test-case for this: Porting ~110 loc of Ltac to Ltac2 added 30s overhead in https://github.com/JasonGross/fiat-crypto/commit/2fa728620a50dd2b4448d23fb16e13c9451e006e (https://github.com/JasonGross/rewriter/compare/f24c094d04d725e0f9f0554b354921d921a10f2e...73177609fdf0d088c576e3074ea4dcda077b3841). (Build target make TIMED=1 SKIP_BEDROCK2=1 pre-standalone
in fiat-crypto, fiat-crypto archived at softwareheritage, rewriter archived at softwareheritage) As I said on Zulip, my current guess is that it's the overhead introduced by the four extra evars involved in replacing
lazymatch ty with
| base_interp ?T => T
| @base.interp base base_interp (@base.type.type_base base ?T) => T
| @type.interp (base.type base) (@base.interp base base_interp) (@Compilers.type.base (base.type base) (@base.type.type_base base ?T)) => T
with
(* work around COQBUG(https://github.com/coq/coq/issues/13962) *)
lazy_match! '($base_interp, $base, $ty) with
| (?base_interp, ?base, ?base_interp ?t) => Some t
| (?base_interp, ?base, @base.interp ?base ?base_interp (@base.type.type_base ?base ?t)) => Some t
| (?base_interp, ?base, @type.interp (base.type ?base) (@base.interp ?base ?base_interp) (@Compilers.type.base (base.type ?base) (@base.type.type_base ?base ?t))) => Some t
but I'm not sure how to go about debugging these issues systematically without a profiler.
Another use-case: in https://github.com/mit-plv/fiat-crypto/pull/1358#issuecomment-1267374005 we have Ltac2 reification taking almost 2 hours, and I don't have a good way to debug what's slow. The code can be built by checking out https://github.com/JasonGross/fiat-crypto/tree/xxx-ltac2-slow-reification-for-coq-coq-10111 and running make TIMED=1 SKIP_BEDROCK2=1 src/PushButtonSynthesis/SolinasReductionReificationCache.vo
Either the Ltac profiler should support Ltac2 (ideal case), or there should be separate profiling for Ltac2.