Closed xaviernogueira closed 1 year ago
Update -> some promising results! @aufdenkampe @imscw95 @kewalak
I took the time to do some experimentation on the core "computation engine", this is currently being done on a notebook in a separate branch but will be merged in. By "computation engine" I am referring to the part of the code that applies all calculations, regardless of which module is being used.
So far I tested:
xr.apply_ufunc
in a function that is JIT compiled with @numba.jit(forceobj=True)
. forceobj=True
dramatically reduces the efficacy of JIT.map()
to create an iterable that contains everything necessary to update a timestep, but as numpy arrarys NOT xr.DataArrays
. Note that I protect against excess memory usage by instantiating a type=map
iterable, therefore all the input arrays are not pre-calculated.nopython=True
mode can't handle "heterogenous list" inputs.Results:
As shown below, we can see that the new V3 approach is nearly 50% faster than the existing method (17ms shaved off). V2 did not speed things up at all. We can also see that the iter_computation()
call within increment_timestep
is responsible for 76% of the run time. Therefore our 50% reduction in `iter_computation()`` should account for a 36% reduction in timestep run time.
Next steps: I am going to implement V3 into the main modules, see if all tests pass, and if so I will merge into the main branch.
@xaviernogueira, thank you for doing this performance profiling and playing with computational improvements to find this valuable performance boost!
Once we get closer to having sub-modules completed, we can focus on improving the performance of the core
base.Model
code. Since this code is inherited, any benefits are passed to all sub-modules. Some experimenting can/will be done to see if we can squeeze out time_step performance. Model init performance is less relevant.We can use
snakeviz
withCprofiler
to assess any computational rate limiting steps.