Performance deterioration when calculating cells in loops

fumitoh / modelx

Use Python like a spreadsheet!

https://modelx.io

GNU Lesser General Public License v3.0

90 stars 20 forks source link

Performance deterioration when calculating cells in loops #12

Closed alebaran closed 4 years ago

alebaran commented 4 years ago

I'm trying to measure speed of the modelx with 500 variables and 40 time steps (see the code below). I get 25 seconds execution time, which is terribly slow, if you compare it with Excel. The time is the same, when I trigger recalculation, when the cells are already created (second part of the speed measurement below)

from modelx import *
import time
model = new_model()
space = model.new_space()

@defcells
def a(t):
    return 1

@defcells
def PnL_b(t, var):
    return a(0)

times = range(1,41+1)
variables = range(1,500+1)
start = time.time()
for t in times:
    for var in variables:
        space.PnL_b(t, var)

time1 = time.time()
print(time1 - start)

space.a[0] = 2
for t in times:
    for var in variables:
        space.PnL_b(t, var)
print(time.time() - time1)

fumitoh commented 4 years ago

The 2 tests should take about the same time. modelx is not creating individual cells internally. There are only "Cells" objects. In the case above , there 2 "Cells" objects space.a and space.PnL_b. They store args and values in dicts in them. space.a[0] = 2 clears all the values in space.PnL_b. It's formula execution that's taking time. I want to improve the performance of modelx eventually, but need to get some work done before I get to it.

alebaran commented 4 years ago

Do you think it can be materially improved (e.g. 10 times) or only marginally? I'll probably not be able to use it, if the time stays in order of 10-20 seconds.

fumitoh commented 4 years ago

The code below takes less than a second, yet it populates 20,000 cells. Change 40 to 400, still takes 1/3 of the runtime of your original code.

import modelx as mx
import time

m, s = mx.new_model(), mx.new_space()

@mx.defcells
def const(t):
    return 1

@mx.defcells
def var(x, y):
    if x == y == 1:
        return const(0)
    elif x > 1 and y == 1:
        return var(x-1, y)
    elif x == 1 and y > 1:
        return var(x, y-1)
    else:
        return (var(x-1, y) + var(x, y-1))/2

start = time.time()
print(var(500, 40))
print(time.time() - start)
print(len(var))

The difference seems to be coming from your code accessing PnL_b in nested for loops 20,000 times. There must be some heavy lifting happening when entering into recursive formula calculations from outside. This may be resulted from the solution of this https://github.com/fumitoh/modelx/issues/7 issue.

I will investigate this performance deterioration further but it probably gets better significantly.

alebaran commented 4 years ago

This really helps. Thank you very much.

alebaran commented 4 years ago

Could you investigate this "There must be some heavy lifting happening when entering into recursive formula calculations from outside" further, please? I can design a work around, but it limits the use of the model. Thank you!

fumitoh commented 4 years ago

You can just do like this for now:

from modelx import *
import time
model = new_model()
space = model.new_space()

@defcells
def a(t):
    return 1

@defcells
def PnL_b(t, var):
    return a(0)

@defcells
def run():
    times = range(1,41+1)
    variables = range(1,500+1)
    for t in times:
        for var in variables:
            PnL_b(t, var)

    return True

start = time.time()
run()
print(time.time() - start)

The heavy lifting is probably because a new thread is created every time you call PnL_b(t, var) in the nested loop. It's probably solved by implementing an event loop within a single thread, but it'd take some time.

fumitoh commented 4 years ago

Just pushed a commit https://github.com/fumitoh/modelx/commit/ce44e797606b8a6262850315731cf4bc9d8e2266 to fix this. Your original code runs roughly 10 times faster.

alexeybaran commented 4 years ago

When do you plan to create a release containing this fix? Thank you!

fumitoh commented 4 years ago

If you're using v0.0.23 now, then all you need to do is to replace modelx/core/system.py with https://github.com/fumitoh/modelx/commit/ce44e797606b8a6262850315731cf4bc9d8e2266

fumitoh commented 4 years ago

Just released v0.0.24, which fixes this issue. See https://modelx.readthedocs.io/en/latest/releases/relnotes_v0_0_24.html

alebaran commented 4 years ago

On a related subject: what is the way to profile modelx execution? I would like to see how long it takes to evaluate various functions.