INET-Complexity / isle

11 stars 20 forks source link

PERF: Optimize metainsuranceorg's iterate #117

Closed rht closed 5 years ago

rht commented 5 years ago

I found from profiling that >90% of the time takes place in metainsuranceorg.iterate(), of which 81.7% of it takes place in self.process_newrisks_insurer(). In self.process_newrisks_insurer, 40.4% happens in self.balanced_portfolio, 30.9% in the creation of InsuranceContract.

In self.balanced_portfolio, 33.9% happens in std_pre = cash_reserved_by_categ.std(), 12.5% happens in mean = cash_reserved_by_categ_store.mean(), 27.8% happens in std_post = cash_reserved_by_categ_store.std(). This is unnaturally slow, since the loop to create cash_reserved_by_categ itself only takes 6% of the time.

rht commented 5 years ago

I tested calculating the mean by just doing sum(cash_reserved_by_categ_store) / len(...). The code is 4x faster. Also found this article saying that numpy is slow for small-sized array (in this case, 4): https://medium.com/coding-with-clarity/speeding-up-python-and-numpy-c-ing-the-way-3b9658ed78f4.

rht commented 5 years ago

With vanilla mean and std, the time goes down from 33.9% -> 27.9%, 12.5% -> 6.4%, 27.8% -> 20.6%. The total time can go down by combining mean and std computation in the 2nd and 3d computation, with another 6.4% reduction. In total, 25.7% reduction.

rht commented 5 years ago

I was able to shave off the 30.9% of InsuranceContract to 0.2% by caching self.contract_runtime_dist.rvs(). The total time spent in process_newrisks_insurer drops from ~5s to ~3s.

jsabuco commented 5 years ago

Hi Rudy,

Thank you very much for the profiling!

Yes, we more or less aware that metainsuranceorg.iterate() was the most computational intensive part of the code.

I think that there is a lot of room for optimization there.

I have just merged your pull request.

Thanks again!

Best,

Juan.


Juan Sabuco, PhD Institute for New Economic Thinking (INET) Mathematical Institute Oxford Martin Fellow, Oxford Martin School University of Oxford

On Sun, 9 Dec 2018 at 01:47, rht notifications@github.com wrote:

I was able to shave off the 30.9% of InsuranceContract to 0.2% y caching self.contract_runtime_dist.rvs(). The total time spent in process_newrisks_insurer drops from ~5s to ~3s.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/INET-Complexity/isle/issues/117#issuecomment-445503766, or mute the thread https://github.com/notifications/unsubscribe-auth/AKBvlJfIE4iQYBrGZ7laPOeTlEpWQWA1ks5u3GujgaJpZM4ZJ5YP .