VanL / shedskin

Shed Skin is an experimental compiler, that can translate pure, but implicitly statically typed Python (2.4-2.6) programs into optimized C++
0 stars 1 forks source link

Allocating 'small' objects #165

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
In multiple places, it's mentioned that allocating many small objects is slow 
in C++, such as creating tuples over and over again.

However, the size of objects is known at parse and compile time, so it should 
be possible to make small objects get allocated from an existing arena, and 
hence not actually call malloc.

If it's known that a small tuple will be allocated inside of a loop, we can 
create an arena of that size and use a custom allocator[0] to avoid malloc.

In the case of a function that allocates a small object and that object never 
escapes (escape analysis is necessary here), it may still be called inside that 
loop. In this case, it might be useful to create a lazy arena that gets 
allocated the first time it's used, but then reuses the existing memory on 
subsequent calls and all instantiations of that object are replaced with the 
proper custom allocator.

The first case should be relatively trivial, assuming that sort of information 
is easily accessible in the compiler, while the latter case will require escape 
analysis and global, lazy, templated allocators.

This is just a proposal, I might find some time to look into this, but I have 
no idea how the parser/compiler works so it will take quite a while to get up 
to speed on that. Hope it sounds good!

[0] http://en.wikipedia.org/wiki/Placement_syntax#Custom_allocators

Original issue reported on code.google.com by fahh...@gmail.com on 2 Apr 2012 at 2:23

GoogleCodeExporter commented 9 years ago
sure, arena-based allocation sounds interesting and useful! escape analysis as 
well of course.. ;-) 

for both techniques, one may wonder if there aren't already existing c++ 
libraries/code snippets/gcc options that do this for us though.. I think this 
would not only be useful to shedskin.. it would certainly help C++ to become 
more java-like - in a good way.

interestingly, if I had chosen java as a target language, we would already have 
such optimizations for free.. and less stability problems with the GC :P

one minor issue of concern with arenas are actually integration with the Boehm 
GC, but I guess if the arena itself it GC'ed, we're fine. although we have to 
be careful the GC is not scanning empty arenas.. ;-) so perhaps many small 
arenas would work better in the context of shedskin, dunno. but anyway, those 
are details.

some random other ideas to avoid excessive memory allocations are to use string 
slice objects, use copy-by-value for short strings and tuples (they're 
immutable, so if their contents fit in 64 bits, why not just copy around the 
contents..?), and perhaps, to try and eliminate 'intermediate' objects 
throughout the code for something like this:

sum([a**2 for a in map(int,wow)])

we really don't want or need the intermediate list that comes from map and from 
the list comprehension, we just want the sum. of course an escape analysis 
could help here, but even better would be to leave the intermediate lists out 
altogether, for example by using iterators when results are consumed right away 
(could to some degree be done as a preprocessing step perhaps).

so - ideas enough :-) but I'm afraid someone has to put in serious time for 
something to happen. I don't think I will personally look further into any of 
this myself in the near future, until there's at least a nice proof-of-concept 
patch or existing library/technique. if I really like the approach, I will 
probably be happy to integrate it into shedskin (especially if it can be done 
gradually). this is how extension module support was also born..

Original comment by mark.duf...@gmail.com on 3 Apr 2012 at 10:06

GoogleCodeExporter commented 9 years ago

Original comment by mark.duf...@gmail.com on 3 Apr 2012 at 10:07