Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

stack usage per instantiation step is too high #13337

Open Quuxplusone opened 12 years ago

Quuxplusone commented 12 years ago
Bugzilla Link PR13263
Status NEW
Importance P normal
Reported by Roland (rbock@eudoxos.de)
Reported on 2012-07-03 13:11:45 -0700
Last modified on 2014-02-18 16:52:04 -0800
Version trunk
Hardware PC Linux
CC antoinep92@gmail.com, dgregor@apple.com, eniebler@boost.org, llvm-bugs@lists.llvm.org, richard-llvm@metafoo.co.uk
Fixed by commit(s)
Attachments test_eric2.cpp (1532 bytes, text/x-c++src)
test_roland2.cpp (1468 bytes, text/x-c++src)
test_richard2.cpp (11256 bytes, text/x-c++src)
test_richard3.cpp (9878 bytes, text/x-c++src)
Blocks
Blocked by
See also
Created attachment 8822
Compile with clang -std=c++11

While experimenting with some c++11 trick by Eric Niebler, I crashed clang with
the attached source code

$ clang -std=c++11 test_eric2.cpp -O3 -v
clang version 3.2 (trunk 155315)
Target: x86_64-unknown-linux-gnu
Thread model: posix
 "/usr/local/bin/clang-3.2" -cc1 -triple x86_64-unknown-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -main-file-name test_eric2.cpp -mrelocation-model static -masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64 -momit-leaf-frame-pointer -v -resource-dir /usr/local/bin/../lib/clang/3.2 -fmodule-cache-path /var/tmp/clang-module-cache -internal-isystem /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5/../../../../../include/c++/4.5 -internal-isystem /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5/../../../../../include/c++/4.5/x86_64-linux-gnu -internal-isystem /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5/../../../../../include/c++/4.5/backward -internal-isystem /usr/local/include -internal-isystem /usr/local/bin/../lib/clang/3.2/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -std=c++11 -fdeprecated-macro -fdebug-compilation-dir /home/rbock/temp/test -ferror-limit 19 -fmessage-length 159 -mstackrealign -fgnu-runtime -fobjc-runtime-has-arc -fobjc-runtime-has-weak -fobjc-fragile-abi -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -o /tmp/test_eric2-ainULp.o -x c++ test_eric2.cpp
clang -cc1 version 3.2 based upon LLVM 3.2svn default target x86_64-unknown-
linux-gnu
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5/../../../../../include/c++/4.5
 /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5/../../../../../include/c++/4.5/x86_64-linux-gnu
 /usr/lib/x86_64-linux-gnu/gcc/x86_64-linux-gnu/4.5/../../../../../include/c++/4.5/backward
 /usr/local/include
 /usr/local/bin/../lib/clang/3.2/include
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
clang-3: error: unable to execute command: Segmentation fault
clang-3: error: clang frontend command failed due to signal (use -v to see
invocation)
clang-3: note: diagnostic msg: Please submit a bug report to  and include
command line arguments and all diagnostic information.
clang-3: note: diagnostic msg: Preprocessed source(s) and associated run
script(s) are located at:
clang-3: note: diagnostic msg: /tmp/test_eric2-2yLiNi.ii
clang-3: note: diagnostic msg: /tmp/test_eric2-2yLiNi.sh

The code compiles fine with a few parameters less (one row less, for instance).

The number of parameters does not seem to be the issue. I tried similar code
with many more parameters and experienced no crash. See also
http://boost.2283326.n4.nabble.com/offtopic-C-11-useful-trick-td4632327.html#a4632453
Quuxplusone commented 12 years ago

Attached test_eric2.cpp (1532 bytes, text/x-c++src): Compile with clang -std=c++11

Quuxplusone commented 12 years ago

It looks like we're not applying the recursive template instantiation depth limit to this case, and crashing due to stack overflow. g++ rejects this eventually (for >900 arguments), because its depth limit is reached.

Our performance on this case seems to be at least quadratic in the number of arguments, but I think that's unavoidable (since the total size of the instantiated templates grows quadratically in the number of arguments).

Quuxplusone commented 12 years ago

Attached test_roland2.cpp (1468 bytes, text/x-c++src): This works fine though

Quuxplusone commented 12 years ago
Comment on attachment 8823
This works fine though

Sorry, uploaded version with less arguments, just add one or more lines of
arguments.

It works with almost twice as many arguments until before uttering

test_roland2.cpp:23:37: note: while substituting deduced template arguments
into function template 'back' [with T = int, U = <no value>]
typename detail::back_t<U...>::type back(T && t, U &&... u)
                                    ^
test_roland2.cpp:6:20: note: use -ftemplate-depth=N to increase recursive
template instantiation depth
                typedef typename back_t<U...>::type type;
Quuxplusone commented 12 years ago
(In reply to comment #1)
> It looks like we're not applying the recursive template instantiation depth
> limit to this case, and crashing due to stack overflow. g++ rejects this
> eventually (for >900 arguments), because its depth limit is reached.
>
> Our performance on this case seems to be at least quadratic in the number of
> arguments, but I think that's unavoidable (since the total size of the
> instantiated templates grows quadratically in the number of arguments).

Shouldn't the second attachment be even worse then? It compiles fine even with
11 rows of arguments...
Quuxplusone commented 12 years ago
To be clear, we have three problems on the original code:

1) We don't appear to apply the instantiation depth limit in this case (we
still hit a stack overflow with -ftemplate-depth=1).
2) The stack usage per instantiation is sometimes high enough to blow out the
stack before we would have hit the limit anyway (depending on ulimits).
3) The compile-time (and, potentially, runtime) performance is quadratic in the
number of arguments (but this is fundamental to the way the code has been
written).

Your second attachment avoids the first problem by using class templates (for
which the limit is enforced), and mitigates the second problem (because the
stack usage per instantiation is lower for that case).
Quuxplusone commented 12 years ago
> 3) The compile-time (and, potentially, runtime)
> performance is quadratic in the number of arguments
> (but this is fundamental to the way the code has
> been written).

Fundamental to the way the code in clang is written, or to the way the code in
the repro case is written? And why quadratic? In may naive view, exactly N
instantiations of the back function should be created. Finding the return type
of each should be O(1), so it seems the overall compile-time complexity should
be linear. And why would any of this have anything to do with the runtime
performance of the generated code?

Confused and intrigued and dismayed.
Quuxplusone commented 12 years ago
> Fundamental to the way the code in clang is written, or to the way the
> code in the repro case is written? And why quadratic? In may naive view,
> exactly N instantiations of the back function should be created.

Right, but instantiating 'back' isn't constant time. The size of each
instantiation is linear in the number of parameters which the corresponding
'back' specialization takes. Hence the total size of the resulting
instantiations is quadratic.

> And why would any of this have anything to do with the runtime
> performance of the generated code?

If you build with -O0, the resulting code will have quadratic running time and
stack usage due to O(n^2) copies of your n arguments being made as we recurse
down the 'back' instantiations.
Quuxplusone commented 12 years ago

Attached test_richard2.cpp (11256 bytes, text/x-c++src): Efficient approach

Quuxplusone commented 12 years ago

Attached test_richard3.cpp (9878 bytes, text/x-c++src): Linear-time approach

Quuxplusone commented 12 years ago

No, this was just reduced from a more general solution. We've drifted off-topic for bugzilla (though feel free to contact me directly).

Quuxplusone commented 12 years ago

The issue with the depth limit not being applied is fixed in r159907. However, on my system (with an 8MiB stack limit), with a Debug build, I can only get up to a depth of 428 before I hit a stack overflow, so we're a little shy of our depth-512 default, and a lot shy of the standard's recommended minimum of 1024.

Quuxplusone commented 12 years ago

I'm guessing the "more general solution" you refer to above is a way to get the nth element from a parameter pack, which I have now implemented based on your ideas. This bug report has been very educational. Thanks, and thanks for your work on this bug. Keep it comin', Richard! :-)

Quuxplusone commented 10 years ago

_Bug 15551 has been marked as a duplicate of this bug._