Open Quuxplusone opened 12 years ago
Attached test_eric2.cpp
(1532 bytes, text/x-c++src): Compile with clang -std=c++11
It looks like we're not applying the recursive template instantiation depth limit to this case, and crashing due to stack overflow. g++ rejects this eventually (for >900 arguments), because its depth limit is reached.
Our performance on this case seems to be at least quadratic in the number of arguments, but I think that's unavoidable (since the total size of the instantiated templates grows quadratically in the number of arguments).
Attached test_roland2.cpp
(1468 bytes, text/x-c++src): This works fine though
Comment on attachment 8823
This works fine though
Sorry, uploaded version with less arguments, just add one or more lines of
arguments.
It works with almost twice as many arguments until before uttering
test_roland2.cpp:23:37: note: while substituting deduced template arguments
into function template 'back' [with T = int, U = <no value>]
typename detail::back_t<U...>::type back(T && t, U &&... u)
^
test_roland2.cpp:6:20: note: use -ftemplate-depth=N to increase recursive
template instantiation depth
typedef typename back_t<U...>::type type;
(In reply to comment #1)
> It looks like we're not applying the recursive template instantiation depth
> limit to this case, and crashing due to stack overflow. g++ rejects this
> eventually (for >900 arguments), because its depth limit is reached.
>
> Our performance on this case seems to be at least quadratic in the number of
> arguments, but I think that's unavoidable (since the total size of the
> instantiated templates grows quadratically in the number of arguments).
Shouldn't the second attachment be even worse then? It compiles fine even with
11 rows of arguments...
To be clear, we have three problems on the original code:
1) We don't appear to apply the instantiation depth limit in this case (we
still hit a stack overflow with -ftemplate-depth=1).
2) The stack usage per instantiation is sometimes high enough to blow out the
stack before we would have hit the limit anyway (depending on ulimits).
3) The compile-time (and, potentially, runtime) performance is quadratic in the
number of arguments (but this is fundamental to the way the code has been
written).
Your second attachment avoids the first problem by using class templates (for
which the limit is enforced), and mitigates the second problem (because the
stack usage per instantiation is lower for that case).
> 3) The compile-time (and, potentially, runtime)
> performance is quadratic in the number of arguments
> (but this is fundamental to the way the code has
> been written).
Fundamental to the way the code in clang is written, or to the way the code in
the repro case is written? And why quadratic? In may naive view, exactly N
instantiations of the back function should be created. Finding the return type
of each should be O(1), so it seems the overall compile-time complexity should
be linear. And why would any of this have anything to do with the runtime
performance of the generated code?
Confused and intrigued and dismayed.
> Fundamental to the way the code in clang is written, or to the way the
> code in the repro case is written? And why quadratic? In may naive view,
> exactly N instantiations of the back function should be created.
Right, but instantiating 'back' isn't constant time. The size of each
instantiation is linear in the number of parameters which the corresponding
'back' specialization takes. Hence the total size of the resulting
instantiations is quadratic.
> And why would any of this have anything to do with the runtime
> performance of the generated code?
If you build with -O0, the resulting code will have quadratic running time and
stack usage due to O(n^2) copies of your n arguments being made as we recurse
down the 'back' instantiations.
Attached test_richard2.cpp
(11256 bytes, text/x-c++src): Efficient approach
Attached test_richard3.cpp
(9878 bytes, text/x-c++src): Linear-time approach
No, this was just reduced from a more general solution. We've drifted off-topic for bugzilla (though feel free to contact me directly).
The issue with the depth limit not being applied is fixed in r159907. However, on my system (with an 8MiB stack limit), with a Debug build, I can only get up to a depth of 428 before I hit a stack overflow, so we're a little shy of our depth-512 default, and a lot shy of the standard's recommended minimum of 1024.
I'm guessing the "more general solution" you refer to above is a way to get the nth element from a parameter pack, which I have now implemented based on your ideas. This bug report has been very educational. Thanks, and thanks for your work on this bug. Keep it comin', Richard! :-)
_Bug 15551 has been marked as a duplicate of this bug._
test_eric2.cpp
(1532 bytes, text/x-c++src)test_roland2.cpp
(1468 bytes, text/x-c++src)test_richard2.cpp
(11256 bytes, text/x-c++src)test_richard3.cpp
(9878 bytes, text/x-c++src)