RazarDessun / shedskin

An experimental (restricted-Python)-to-C++ compiler
0 stars 0 forks source link

Let me give the type inference system hints #94

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is a feature request, not a bug.

Let's say I have a class A whose constructor takes an object of class B.
B's constructor needs a C, and C's needs a D.  But A never uses the B; it just 
holds it for later.  Then when writing the __main__ function to help out the 
type inference system, I now need to import B, C, and D, build a D, then a C, 
then a B, and pass that to A's constructor.  That's a bunch of messy code and 
extra imports just so that we know what that data member in A is...and really 
A's just treating it as a void*, but presumably that causes problems elsewhere.

Picture this in A's constructor:

class A(object):
  def __init__(self, b):
    assert isinstance(b, B)
    self.b = b
    ...

I still need to import B, but it ends there.  If the type inference system will 
take my word for it, via the assert, we can save a lot of clutter.  It might 
even make shedskin run faster, or on larger files, since it can completely stop 
worrying about that section of the type graph.

Or if you don't want to overload assert, you could always allow some 
shedskin-specific annotation in a comment.

Original issue reported on code.google.com by uran...@gmail.com on 19 Aug 2010 at 5:01

GoogleCodeExporter commented 9 years ago
I'm really not a big fan of type declarations in any form (I actually started 
shedskin to avoid them.. :P), and judging from the examples/ dir, I think 
usually it's not that bad. though I'd be interested to see what you end up 
with..

for speed, I would much prefer to use a profiler or storing analysis results 
between compilation sessions to aid type inference (see the latest thread in 
the discussion group).

Original comment by mark.duf...@gmail.com on 19 Aug 2010 at 8:25

GoogleCodeExporter commented 9 years ago
I'm sure it's not bad in the examples directory, but examples are generally by 
their nature small.  This would help more when things are large.

It's not like I don't have to do extra type declarations--it's just that this 
way I could do small, targeted declarations where they'd do the most good, 
instead of constructing artificial main functions of 
type-correct-but-generally-garbage pseudo-code down at the bottom of the file.  
I construct code that, while wrong, provides enough signals that type inference 
can figure out what's what.  That's just a big, ugly, inefficient type 
declaration.

Anyway, this is just one possibility, though I'll bet it'd be much simpler to 
get in than that live type analysis you were posting about.  Granted, the live 
type analysis will probably be more fun to code.

But would live type analysis would only help once a program was fully 
shedskin-compatible?  I'd like to have the type hints to help me in my porting. 
 Currently if I haven't quite covered things in my pseudo-main, sometimes it's 
hard to figure out what I've messed up.  This would make it easier to localize 
the problem case, like the optional type restrictions in Haskell.  Without 
them, you might look all day for the real problem, since the system can't tell 
where the true error is, only where it was when it hit a contradiction.

Original comment by uran...@gmail.com on 19 Aug 2010 at 8:42

GoogleCodeExporter commented 9 years ago

Original comment by mark.duf...@gmail.com on 21 Aug 2010 at 8:15

GoogleCodeExporter commented 9 years ago
well, the examples are over 10,000 lines in total (sloccount). I guess the 
problem mostly occurs when generating extension modules, where the logic of 
calling things occurs later, on the CPython side. a simple workaround could be 
to move enough logic to the extension module, for type inference not to need 
the ugly 'type models'. like loading a scene from disk and starting the actual 
raytracing.

I agree that looking at assert statements and such could avoid type models in 
some cases (though not all - what if a function also accepts a list-of-int 
argument? or an int? '1' is shorter than 'assert isinstance(arg, int)'), but 
again I'm not really interested in going down this path at this point. it just 
doesn't look like it's worth the effort and added complexity.

I'm not sure I understand your point about the live analysis, but of course a 
tool that can annotate python source code with types could be very useful also 
without shedskin ;)

Original comment by mark.duf...@gmail.com on 28 Aug 2010 at 10:37

GoogleCodeExporter commented 9 years ago
Currently, many of the programs I write and use shedskin on have several lines 
that are basically type declarations, but get compiled into C++ anyway. For 
instance, shedskin was having trouble understanding what a list comprehension 
outputs, so I had to add a line before it (it would put ERROR in the .ss.py):

positions = [(0,0)]
positions = [(0,0) for i in range(length)]

This would be normally be fine with me, but it gets translated into the C++:

positions = (new list<tuple2<__ss_int, __ss_int> *>(1, (new tuple2<__ss_int, 
__ss_int>(2, 0, 0))));
positions = list_comp_0(length);

And that's on the inside of a for loop that could repeat millions of times, 
causing a horrendous memory leak (why doesn't it free() it?) and speed 
degradation.

If I could somehow actually declare the type of positions I'd have a slightly 
faster output and shedskin wouldn't need any templates to figure out 
positions's type. I believe this is a sufficient example to prove type 
declaration can be useful in any code, mine in particular is only 47 lines.

Original comment by fahh...@gmail.com on 3 Oct 2010 at 6:06

GoogleCodeExporter commented 9 years ago
thanks for the feedback. you have probably run into a bug, because shedskin 
shouldn't need any such type declarations.. please consider opening an issue 
for your program(s), so I can have a look!

note that in the shedskin example programs, there are about 50 programs, at a 
total of over 10,000 lines (sloccount), that work without any form of such type 
hints, except for a few lines where it really is unavoidable for type inference 
to work (for example, when we build an extension module, we cannot do without a 
'fake main').

btw, free() is not necessary for shedskin generated code, because it uses the 
Boehm GC, which automatically frees memory when it becomes unreachable.

Original comment by mark.duf...@gmail.com on 3 Oct 2010 at 8:44

GoogleCodeExporter commented 9 years ago

Original comment by mark.duf...@gmail.com on 5 Oct 2010 at 2:06

GoogleCodeExporter commented 9 years ago
It may be that they all work, but a large problem I have is that if shedskin 
uses too much memory it starts swapping and my computer dies. I do all my Linux 
dev on a desktop with only 512MB RAM which is normally sufficient with Linux 
tools, but shedskin, unless I've done a lot of work to limit the templates 
available, easily goes to up 50% or more, at which point my computer locks up 
and only through plenty of Ctrl-C presses and a few minutes do I get out of 
Shedskin. Adding type hints would provide the ability to decrease the time 
taken inferring some things.

One simple example is this:

I have a module I've already shedskin'd with a function 
LJ_Minimum_Image_forces, and then I replaced the .py file with an 
inference-only stub that contains the fewest number of lines I could do:

class F_kls():
    def __init__(self, x=[0.0],y=[0.0],z=[0.0]):
        self.x=x
        self.y=y
        self.z=z

def 
LJ_Minimum_Image_forces(posx=[0.,0.],posy=[0.,0.],posz=[0.,0.],lcfg=2,r_lcfg=ran
ge(2),L=5.0):
    # calculate force at t
    Fx=Fy=Fz = [0. for i in r_lcfg]
    v_lattice = 0
    return (F_kls(Fx,Fy,Fz),v_lattice)

if __name__=='__main__':
    F,V = LJ_Minimum_Image_forces([0.,0.],[0.,0.],[0.,0.],2,range(2),5.0)
    Fx,Fy,Fz = F.x,F.y,F.z

Shedskin can handle this file as well as the originally, but the moment I call 
this function in another file, Shedskin blows up.

If I was able to simply say LJ_Minimum_Image_forces takes 
(list(float),list(float),list(float),int,list(int),float) and returns 
(F_kls,float), I would be able to compile other modules that use this.

In general, shedskin re-infers modules that are unnecessary. If it could accept 
an annotated (like the files it outputs) file, it could also save time by not 
re-processing files when shedskin'ing other files.

I have narrowed the problem down to that single call using #{ #} around my code 
but I can't tell how much it actually blows up because my hard drive lights up 
a constant red for as long as 10 minutes after the first Ctrl-C.

Original comment by fahh...@gmail.com on 16 Oct 2010 at 4:00

GoogleCodeExporter commented 9 years ago
thanks for the feedback. shedskin 0.6 should take _much_ less memory than 0.5 
in many cases.. do you still see the problem with 0.6..? if so, could you 
please send me a complete program that 0.6 has trouble with? thanks! :)

Original comment by mark.duf...@gmail.com on 18 Oct 2010 at 2:16

GoogleCodeExporter commented 9 years ago
issue 105 is a duplicate of this issue, with a suggestion to use some new 
python 3 pep dealing with type declarations.

Original comment by mark.duf...@gmail.com on 27 Oct 2010 at 1:24

GoogleCodeExporter commented 9 years ago
I think we can close this one.. we ended up compiling pylot without almost any 
type declarations (uranium), and since 0.6, shedskin takes only a few hundred 
MB of memory for even the largest examples (fahhem).. please reopen if you 
disagree. 

Original comment by mark.duf...@gmail.com on 27 Feb 2011 at 12:35