Open Quuxplusone opened 10 years ago
Attached sched_ule.i
(543399 bytes, application/octet-stream): test case
GCC is directly inlining cpu_search into cpu_search_highest, there is no cloning happening here.
This seems to be a limitation of our inliner as it directly aborts when it encounters recursion. Not sure how GCC determines that inlining is safe when recursion is present.
Marking cpusearch{lowest,highest} as noinline makes llvm inline the cpu_search just fine. Can llvm be taught this trick?
(In reply to comment #2)
> Marking cpu_search_{lowest,highest} as noinline makes llvm inline the
> cpu_search just fine. Can llvm be taught this trick?
That doesn't reproduce for me. I'm relatively sure that we never inline
recursive calls. I'm still not entirely sure what magic GCC is applying here,
however one neat solution would be to clone and constant propagate the function
3x. Each clone would shrink significantly because most branches go away.
Fun fact: We are inlining the always_inline function at -O0, but not at -O2.
Got it now.
We inline cpusearch{highest, lowest, both} into cpu_search, making it directly recursive and blocking any further inlining.
GCC inlines cpu_search into cpusearch{highest, lowest, both}, getting nice constant propagation.
Probably no way to fix this because of the different strategies in inlining (top-down vs. bottom-up). So we're left with cloning, which isn't feasible without profiling info, which isn't available without pass manager work.
sched_ule.i
(543399 bytes, application/octet-stream)