UDST / synthpop

Synthetic populations from census data
BSD 3-Clause "New" or "Revised" License
100 stars 47 forks source link

IPU Performance Upgrades #14

Closed jiffyclub closed 10 years ago

jiffyclub commented 10 years ago

After profiling the IPU code it became clear that Pandas was slowing things down. Indexing Series and doing arithmetic on Series are both quite slow, you apparently don't want that happening in looping code. So I changed things around so at the beginning of household_weights it converts everything to NumPy arrays, runs the algorithm, and then passes back a Series. After all the tuning my test case when from running in 4.1 seconds to 0.15 seconds. So the lesson is here is to not use Pandas in iterative numerical algorithms, instead drop back to NumPy.

As an illustration, here's a profile of household_weights before I started tuning:

screen shot 2014-09-04 at 5 26 48 pm

And here's the after picture:

screen shot 2014-09-04 at 5 26 53 pm

Pretty much all the stuff that disappeared is Pandas indexing, replaced with NumPy C fastness.

coveralls commented 10 years ago

Coverage Status

Coverage increased (+0.18%) when pulling e62901b94bbd0d39ba882a0164f137458980acf2 on speedup-ipu into 3355345b71239a61eb6a96366d0a47e45d091d46 on master.

waddell commented 10 years ago

Very impressive speedup here! I suspect there are similar bottlenecks to consider in UrbanSim...

On Thu, Sep 4, 2014 at 10:34 PM, Matt Davis notifications@github.com wrote:

After profiling the IPU code it became clear that Pandas was slowing things down. Indexing Series and doing arithmetic on Series are both quite slow, you apparently don't want that happening in looping code. So I changed things around so at the beginning of household_weights it converts everything to NumPy arrays, runs the algorithm, and then passes back a Series. After all the tuning my test case when from running in 4.1 seconds to 0.15 seconds. So the lesson is here is to not use Pandas in iterative numerical algorithms, instead drop back to NumPy.

As an illustration, here's a profile of household_weights before I started tuning:

[image: screen shot 2014-09-04 at 5 26 48 pm] https://cloud.githubusercontent.com/assets/920492/4160550/c75a6db6-34bd-11e4-9956-0c7897968eb8.png

And here's the after picture:

[image: screen shot 2014-09-04 at 5 26 53 pm] https://cloud.githubusercontent.com/assets/920492/4160556/d8853c4c-34bd-11e4-96a1-89465300e2e9.png

Pretty much all the stuff that disappeared is Pandas indexing, replaced

with NumPy C fastness.

You can merge this Pull Request by running

git pull https://github.com/synthicity/synthpop speedup-ipu

Or view, comment on, or merge it at:

https://github.com/synthicity/synthpop/pull/14 Commit Summary

  • Use NumPy arrays internally in IPU
  • Add max iterations check in IPU.
  • Test max iterations check in IPF.

File Changes

Patch Links:

— Reply to this email directly or view it on GitHub https://github.com/synthicity/synthpop/pull/14.