Observable combinatorial slowdown in old and new unification in the presence of evars

Description of the problem

This issue is the continuation of a zulip discussion. I don't know how to describe this problem in general so here's a minimal example that reproduces the slowdown I am seeing in real code.

Structure Op := { op_type : Type; op_op : op_type -> op_type -> op_type }.
Arguments op_op {_}.
Infix "#" := op_op (right associativity, at level 0).
Canonical Op_nat : Op := {|op_type := nat; op_op := Nat.add|}.
Parameter a b c d e f g h i j k : nat.

Goal True.
  eassert (forall x y,
              a # b # c # d # e # f # g # h # i # j # k # x
              =
              a # b # c # d # e # f # g # h # i # j # k # y
          ).
  intros.
  Timeout 5 apply eq_refl || idtac. (* fails quickly *)
Abort.

Goal True.
  eassert (forall x y,
              a # b # c # d # e # f # g # h # i # j # k # ?[evar] # x =
              a # b # c # d # e # f # g # h # i # j # k #  ?evar  # y
          ).
  intros.
  Timeout 5 apply eq_refl || idtac. (* times out *)

As you can see, even with two possible reduction steps (op_op Op_nat -> Nat.add -> fix add ...) unification is already unable to answer the query in reasonable time for 13 nested operations. If this number seems large, consider that the original problem arises in an iris development and dealing with embedded logic goals of 13 conjuncts feels like it should be well within the capabilities of a proof assistant. And it usually is, just not for this kind of unification query.

Extending the reduction chain makes the problem even more obvious:

Structure Op := { op_type : Type; op_op : op_type -> op_type -> op_type }.
Arguments op_op {_}.
Infix "#" := op_op (right associativity, at level 0).
Definition my_add a b := Nat.add a b.
Canonical Op_nat : Op := {|op_type := nat; op_op := my_add|}.
Parameter a b c d e f g h i j k : nat.

Goal True.
  eassert (forall x y,
              a # b # c # d # e # f # g # h # i # j # k # x
              =
              a # b # c # d # e # f # g # h # i # j # k # y
          ).
  intros.
  Timeout 5 apply eq_refl || idtac. (* fails quickly *)
Abort.

Goal True.
  eassert (forall x y,
              a # b # c # d # ?[evar] # x =
              a # b # c # d # ?evar  # y
          ).
  intros.
  Set Debug Tactic Unification.
  Timeout 5 apply eq_refl || idtac. (* times out *)

Now 6 terms are enough to hit the 5s limit.

As far as I can tell, this problem exists in both the old and the new unification engine. However, evarconv seems to be faster for this in general and needs more terms before it becomes unbearably slow. I am not very good a interpreting the output of Set Debug Unification so I can't say if there is an actual algorithmic difference somewhere. I doubt it, though, since the runtime doubles with every new operand.

I understand that unification is hard to optimize in general. However, the kind of syntax overloading using CSs or TCs is so common that it makes me wonder if there isn't a noticeable bit of performance to be gained from putting more smarts into the unification engines here. I know iris itself already optimized away a whole layer of structures to speed up unification. Maybe something can be done in Coq itself to avoid the extra cost.

Coq Version

8.13.2

coq / coq

Observable combinatorial slowdown in old and new unification in the presence of evars #14747

Description of the problem

Coq Version