blag / pyp

Automatically exported from code.google.com/p/pyp
0 stars 0 forks source link

Pyp one-liner very slow and uses a lot of memory #21

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Create test file with 

for j in xrange(50000):
    print ",".join(str(i) for i in [random.choice(xrange(1000)) for i in xrange(8)

2. Test with

time (cat testmedium.txt |~/.local/bin/pyp "mm | p if n==0 else (p[:-2] + 
[(int(x)%12) for x in p[-2:]]) | mm" > /dev/null)

3. See how slow it is and much memory is being used.

What is the expected output? What do you see instead?

I expect it to take under a second and use almost no RAM. Instead it takes 
about 90 seconds on my computer and uses about 730MB of RAM.  If you change 
50000 to 500000 it takes a huge amount of time.

For comparison I tested a perl equivalent 

time (cat testmedium.txt |perl -l -a -F',' -p -e'if ($. > 1) { $F[6] %=12; 
$F[7] %= 12;$_ = join(q{,}, @F[6,7]) }' > /dev/null)

real    0m0.196s
user    0m0.192s
sys 0m0.012s

What version of the product are you using? On what operating system?

pyp 2.11 on ubuntu 12.10

Please provide any additional information below.

Original issue reported on code.google.com by adalbert...@gmail.com on 5 May 2013 at 7:02

GoogleCodeExporter commented 9 years ago
The print line in step 1. was missing some closing brackets. It should have 
been.

  print ",".join(str(i) for i in [random.choice(xrange(1000)) for i in xrange(8)])

In relation to speed, two things. 

First, a simple python program that has the same functionality also takes less 
than a second so this is not a perl versus python issue.  

Second, this simple line is also very slow.

./pyp_beta "mm | mm" < testmedium.txt> /dev/null 

Original comment by adalbert...@gmail.com on 5 May 2013 at 8:26

GoogleCodeExporter commented 9 years ago
Thanks for the detailed tests.  We are looking into a c++ compiled version that 
will hopefully run faster.  Most of the applications we use pyp for are 
hundreds of lines, where it operates very fast on modern computers...hopefully 
we'll be able to get something a bit faster for larger data sets such as these.

Original comment by tobyro...@gmail.com on 22 May 2013 at 4:44

GoogleCodeExporter commented 9 years ago
Also, the 5000 line test takes 10 secs on my iMac...much slower than the perl 
and python calls you are seeing, but significantly faster than the 90 secs you 
are seeing on ubuntu.

Original comment by tobyro...@gmail.com on 22 May 2013 at 5:11

GoogleCodeExporter commented 9 years ago
I tested pyp 2.12 on my newly 64 bit machine and the 50000 line test

pyp "mm | mm" < testmedium.txt> /dev/null  

takes 50 seconds.

time (cat testmedium.txt |pyp "mm | p if n==0 else (p[:-2] + [(int(x)%12) for x 
in p[-2:]]) | mm" > /dev/null)

takes 75 seconds and 1.3GB of memory!

 (There was no 5000 line test so I am not sure what comparison was being made with the iMac.)

Original comment by adalbert...@gmail.com on 12 Feb 2014 at 10:33

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
The perl code doesn't do the same thing as the pyp.  The output is quite 
different so I suspect there might be a copy/paste error somewhere.

However it seems that what you are doing is testing one thing, the split and 
unsplit joined on comma and it may be that the perl code is doing an 
optimization for this.

If you do cat testmedium.txt | pyp "mm" then that alone is 20 seconds on my 
machine vs 37 seconds for the "mm | mm" so the question becomes why is the 
split so slow.  I created a new file separated by slashes instead of commas and 
that the same so its the split process.

Then I wondered if the issue wasn't just the whole process so how long does

    time cat testmedium.txt |pyp "p" >> /dev/null

take ... hmmm ... very close to 20 seconds so we're down to reading 50,000 
lines of 8 numbers in and writing them out.  I guess I could have a look at the 
basic process and try to understand that.

Note that using pypy approximately doubles speed.  I used an alias pl=pypy 
/usr/bin/pyp and 20 seconds becomes 9.  But why should this be so slow as it 
seems to be a basic process.

Original comment by mjohnly...@gmail.com on 24 May 2014 at 12:43

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
So I dropped back to Python to check how long it took to read and write the 
entire file line by line:

    of = open("outfile", 'w')
    with open("testmedium.txt","r") as f:
        for l in f:
            of.write(l)
    of.close()

which takes about 0.02 seconds ... so pyp is 1000x slower for this base 
operation.

Thinking a bit further:  pyp is taking about 200 microseconds per line for 
overheads which drops to 100 with pypy. 

I looked at the code and its basically that its doing a lot of copies and a 
recursive approach has been chosen, initially of the array, then one element at 
a time through each line of the array.  If you make a faster conversion for p 
you can cut it down to around a quarter but you're still left with a fair 
amount of time.  

Perhaps thats ok and if you need a faster output then pure python or perl is a 
better solution.  Unless someone wants to rewrite it to avoid all the copying 
or use iteration so that it stops building huge stacks :)

Original comment by mjohnly...@gmail.com on 24 May 2014 at 9:36

GoogleCodeExporter commented 9 years ago
hi, thanks so much for working on this.  looks like you have found some 
bottlenecks in your research. The fundamental problem is that pyp needs to be 
compiled...there is no real reason it should be in python except that it is 
easy to modify that way.  If you get a chance, please check this out:

https://github.com/alexbyrnes/pyp/blob/master/Makefile

it's a partially compiled pyp that runs 4x faster.  Please email 
alexbyrnes@gmail.com about this.  This is an experimental project and may be 
rolled into pyp because of it's significant speed increase.

Original comment by tobyro...@gmail.com on 24 May 2014 at 10:51

GoogleCodeExporter commented 9 years ago
That's very interesting Alex.

You might also take a look at a newish language called Nimrod.  Its syntax is 
Python / Pascal like with strong typing (with some inference) but it compiles 
to C.

I converted some Python code that took 800 milliseconds on the sample data to 
Nimrod. A naive copy achieved 60ms but with structural modifications to remove 
unneeded copies (not possible in Python) was able to get to 8.5 ms.  Other 
conversions typically got a 30-60x improvement doing considerably better than 
cython and numba.

Original comment by mjohnly...@gmail.com on 25 May 2014 at 9:44

GoogleCodeExporter commented 9 years ago
Hello,

A quick profiling show the following bottlenecks:
- 48% of time in split, 48% in join, 
- each PypStr object creation execute self.file, self.ext, self.dir even if not 
called. It could be changed to python properties (@property) to avoid execution 
at each time (see patch below).
- antoher cost is the loading of PypStrCustom each time (with a try/except). It 
would be more efficient to have a flag that indicate there's no custom instead 
of entering in a try/except block for each creation. (see path below).

Patch are here:
https://github.com/digitalfox/pyp/commits/master

With those two patch, the processing of the test file is 19,1 sec (and 54,3 sec 
with original pyp). It has no effect on memory consumption.

Thanks for such a great tool.

Original comment by sebastie...@gmail.com on 1 Jun 2014 at 5:52

GoogleCodeExporter commented 9 years ago
Wow! thank you! Am I reading this correctly? 96% of processing is spent in 
split and join?  Is this mostly the pregenerated spilts?  Maybe the thing to do 
is have a flag to turn this off for speed users.  I'll check out you're patches 
for the next release...tripling the speed is fantastic.  Also, what are you 
using to profile this?  Thanks, and keep up the good work!

t

Original comment by tobyro...@gmail.com on 4 Jun 2014 at 12:12

GoogleCodeExporter commented 9 years ago
96% in split and join for pyp "m|m". 

The hard work will be to lower mem consumption. I do not see easy things to do. 
Python generator and yield stuff could be used, but that's not easy without 
making code more complexe. 

For profiling, i use two things:
- pyinstrument : quite simple, but show hotspot immediatly with simple and easy 
to read tree view. Pyinstrument is yound but very promiging. It is here : 
https://github.com/joerick/pyinstrument
- cProfile (default python profiler) for deeper analysis (ncalls, cumulative 
time etc.).

One just need to start pyp with "python -m cProfile pyp" or "python -m 
pyinstrument pyp" and voilà.

Original comment by sebastie...@gmail.com on 4 Jun 2014 at 9:11

GoogleCodeExporter commented 9 years ago
you might check this out...appears to run much faster...100x.

t
https://code.google.com/p/pyp/issues/detail?id=29

Original comment by tobyro...@gmail.com on 16 Sep 2014 at 9:31