Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

LLVM fails to inline/clone recursive function, leading to 3x slowdown compared to gcc #20153

Open Quuxplusone opened 10 years ago

Quuxplusone commented 10 years ago
Bugzilla Link PR20154
Status NEW
Importance P normal
Reported by Roman Divacky (rdivacky@freebsd.org)
Reported on 2014-06-28 05:53:35 -0700
Last modified on 2014-07-04 09:50:33 -0700
Version trunk
Hardware PC Linux
CC benny.kra@gmail.com, chandlerc@gmail.com, dimitry@andric.com, emaste@freebsd.org, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, rafael@espindo.la
Fixed by commit(s)
Attachments sched_ule.i (543399 bytes, application/octet-stream)
Blocks
Blocked by
See also
Created attachment 12716
test case

The attached (big test case) when compiled with trunk clang doesnt have
cpu_search() function inlined (or cloned?) into its callers even when marked
always_inline.

This is because the function is (indirectly) recursive. When compiling the same
file with gcc49, the cpu_search function is inlined.

This file is implementation of scheduler from FreeBSD and as reported
(https://kib.kiev.ua/kib/pgsql_perf.pdf) inlining it results in massive
improvement in performance (25% -> 8%).
Quuxplusone commented 10 years ago

Attached sched_ule.i (543399 bytes, application/octet-stream): test case

Quuxplusone commented 10 years ago

GCC is directly inlining cpu_search into cpu_search_highest, there is no cloning happening here.

This seems to be a limitation of our inliner as it directly aborts when it encounters recursion. Not sure how GCC determines that inlining is safe when recursion is present.

Quuxplusone commented 10 years ago

Marking cpusearch{lowest,highest} as noinline makes llvm inline the cpu_search just fine. Can llvm be taught this trick?

Quuxplusone commented 10 years ago
(In reply to comment #2)
> Marking cpu_search_{lowest,highest} as noinline makes llvm inline the
> cpu_search just fine. Can llvm be taught this trick?

That doesn't reproduce for me. I'm relatively sure that we never inline
recursive calls. I'm still not entirely sure what magic GCC is applying here,
however one neat solution would be to clone and constant propagate the function
3x. Each clone would shrink significantly because most branches go away.
Quuxplusone commented 10 years ago

Fun fact: We are inlining the always_inline function at -O0, but not at -O2.

Quuxplusone commented 10 years ago

Got it now.

We inline cpusearch{highest, lowest, both} into cpu_search, making it directly recursive and blocking any further inlining.

GCC inlines cpu_search into cpusearch{highest, lowest, both}, getting nice constant propagation.

Probably no way to fix this because of the different strategies in inlining (top-down vs. bottom-up). So we're left with cloning, which isn't feasible without profiling info, which isn't available without pass manager work.