FDO: inline builtins, if profitable

GoogleCodeExporter commented 8 years ago

This probably applies to all C functions, but I have builtins on my mind:

It seems like we should be able to inline calls to len() if the callsite is 
monomorphic. Rather than going through builtin_len(), which calls 
PyObject_Size(), which looks up o->ob_type->tp_as_sequence->sq_length 
or calls PyMapping_Size(), which looks up o->ob_type->tp_as_mapping-
>mp_length, we should do all of that at compile time and either cache the 
function pointer in the IR or inline the resulting function, if profitable.

The best example of this is "len(some_dict)", which goes through 
builtin_len() -> PyObject_Size() -> PyMapping_Size(), all just to look up the 
dict's ma_used field. We should be able to resolve this to a simple lookup 
of ma_used in the IR and omit all the intermediate steps.

It should be possible to compile all these functions with Clang, but we'd 
have to get constant propagation working well so that LLVM could figure 
out that all that work resolves to ma_used. That's the more general 
solution, but a stop-gap/hack would be to special-case the builtin types.

Original issue reported on code.google.com by collinw on 25 Jul 2009 at 9:53

GoogleCodeExporter commented 8 years ago

Original comment by collinw on 26 Sep 2009 at 12:05

Added labels: Release-2009Q4

GoogleCodeExporter commented 8 years ago

Original comment by collinw on 6 Jan 2010 at 11:43

Removed labels: Release-2009Q4

GoogleCodeExporter commented 8 years ago

Attached is a preliminary patch that implements inlining for len() on strings 
and 
lists. The bad thing is that, in my tests, it shows a roughly 2% slowdown, and 
I can't 
figure out why.

Original comment by intelliy...@gmail.com on 22 Feb 2010 at 6:34

Attachments:

inlining_len_for_string_and_list.patch

GoogleCodeExporter commented 8 years ago

FWIW, uploaded to rietveld for easier viewing: 
http://codereview.appspot.com/218053

Original comment by thomaswout@gmail.com on 22 Feb 2010 at 6:51

GoogleCodeExporter commented 8 years ago

Actual code review: http://codereview.appspot.com/218056/show

Original comment by collinwi...@google.com on 22 Feb 2010 at 8:36

GoogleCodeExporter commented 8 years ago

r1102 added a manually-specialized version of len() for certain types.

### django ###
Min: 0.815731 -> 0.803618: 1.0151x faster
Avg: 0.818294 -> 0.806813: 1.0142x faster
Significant (t=11.238954)
Stddev: 0.00715 -> 0.00729: 1.0191x larger
Timeline: http://tinyurl.com/yl4v2n3

Results from an ultra-synthetic nanobenchmark to demonstrate maximum benefit:

### len ###
Min: 0.001329 -> 0.000866: 1.5347x faster
Avg: 0.001375 -> 0.000894: 1.5391x faster
Significant (t=183.295677)
Stddev: 0.00004 -> 0.00002: 1.5365x smaller
Timeline: http://tinyurl.com/yf3yee5

The more sophisticated Clang-based top-down inliner jyasskin is developing will 
need to meet or 
exceed this level of performance.

Original comment by collinw on 23 Feb 2010 at 8:29

Added labels: Priority-High
Removed labels: Priority-Medium

arvindm95 / unladen-swallow

FDO: inline builtins, if profitable #75