Performance difference between using "brainpy.math.for_loop" and "model.jit_step_run"

CloudyDory commented 1 year ago

In the documentation of monitor every multiple steps, two methods are provided. One using brainpy.math.for_loop and the other using model.jit_step_run. I have profiled the running speed of the given two examples, and find that model.jit_step_run consistently runs faster than brainpy.math.for_loop (at least on my platform, on both CPU and GPU).

I am a bit surprised by the result, since using model.jit_step_run requires writing explicit python for-loop, which I think should be slow. What might be reason behind the performance difference?

Profile code:

import time
import numpy as np
import matplotlib.pyplot as plt

import brainpy as bp
import brainpy.math as bm

bm.set_platform('cpu')

#%%
class EINet(bp.DynSysGroup):
    def __init__(self):
        super().__init__()
        self.N = bp.dyn.LifRef(4000, V_rest=-60., V_th=-50., V_reset=-60., tau=20., tau_ref=5.,
                               V_initializer=bp.init.Normal(-55., 2.))
        self.delay = bp.VarDelay(self.N.spike, entries={'I': None})
        self.E = bp.dyn.ProjAlignPostMg1(comm=bp.dnn.EventJitFPHomoLinear(3200, 4000, prob=0.02, weight=0.6),
                                         syn=bp.dyn.Expon.desc(size=4000, tau=5.),
                                         out=bp.dyn.COBA.desc(E=0.),
                                         post=self.N)
        self.I = bp.dyn.ProjAlignPostMg1(comm=bp.dnn.EventJitFPHomoLinear(800, 4000, prob=0.02, weight=6.7),
                                         syn=bp.dyn.Expon.desc(size=4000, tau=10.),
                                         out=bp.dyn.COBA.desc(E=-80.),
                                         post=self.N)

    def update(self, input):
        spk = self.delay.at('I')
        self.E(spk[:3200])
        self.I(spk[3200:])
        self.delay(self.N(input))
        return self.N.spike.value

    def run(self, ids, inputs):  # the most import function!!!
        for i, inp in zip(ids, inputs):
            bp.share.save(i=i, t=bm.get_dt() * i)
            self.update(inp)
        return self.N.spike.value

#%% brainpy.math.for_loop
n_step_per_monitor = 10
indices1 = np.arange(10000).reshape(-1, n_step_per_monitor)
inputs1 = np.ones_like(indices1) * 20.0

model = EINet()

start_time = time.time()
spks1 = bm.for_loop(model.run, (indices1, inputs1), progress_bar=False)
end_time = time.time()
print('{:.2f} seconds'.format(end_time - start_time))

spks1 = bm.as_numpy(spks1)

plt.figure()
bp.visualize.raster_plot(indices1[:,0], spks1, show=True)

#%% brainpy.math.jit
n_step_per_monitor = 10
indices2 = np.arange(10000)
inputs2 = np.ones_like(indices2) * 20.

model = EINet()

spks2 = []

start_time = time.time()
for i in indices2:
    model.jit_step_run(i, inputs2[i])

    if i % n_step_per_monitor == 0:  
        spks2.append(model.N.spike.value)  # monitor spikes every time

end_time = time.time()
print('{:.2f} seconds'.format(end_time - start_time))

spks2 = bm.as_numpy(spks2)

plt.figure()
bp.visualize.raster_plot(indices2[::n_step_per_monitor], spks2, show=True)

Outputs:

1.96 seconds
1.01 seconds

Even if I reverse the order of the two methods, the result are almost the same, so the difference is not caused by the JIT compilation time during the first run.

chaoming0625 commented 1 year ago

I guess the difference lies in the compilation time of brainpy.math.for_loop. But I will perform more experiments to see what's going on for such a difference.

CloudyDory commented 10 months ago

Actually, brainpy.math.for_loop will be faster if we increase the simulation time steps from 10000 to 100000.

brainpy / BrainPy

Performance difference between using "brainpy.math.for_loop" and "model.jit_step_run" #552