ImperialCollegeLondon / virtual_ecosystem

This repository is the home for the codebase for the Virtual Ecosystem project.
https://virtual-ecosystem.readthedocs.io
BSD 3-Clause "New" or "Revised" License
9 stars 1 forks source link

Replace daily lists in HydrologyModel with array implementation #517

Open davidorme opened 3 months ago

davidorme commented 3 months ago

Is your feature request related to a problem? Please describe.

At the moment, the hydrology model tracks daily values for variables within a time step by appending daily arrays to a list and then using np.stack to combine them at the end to get statistics across the time step.

https://github.com/ImperialCollegeLondon/virtual_ecosystem/blob/68129075e9bd03f77d612eb5b6b2447d49a59aef/virtual_ecosystem/models/hydrology/hydrology_model.py#L390-L408

Since the number of days is known (and will always be known?), this can be done by declaring a days x grid cell numpy array and inserting daily rows. If the code below is a fair comparison of the two approaches then this is faster and cleaner. I have to admit I was expecting it to be more faster, but it is faster and the advantage greater as the number of cells increases.

In [1]: import timeit
   ...: import numpy as np
   ...: 
   ...: 
   ...: def stacking_arrays(n_cells=10, n_days=30):
   ...: 
   ...:     val = []
   ...:     for day in np.arange(n_days):
   ...:         val.append(np.arange(n_cells))
   ...: 
   ...:     return np.sum(np.stack(val, axis=1), axis=1)
   ...: 

In [2]: def matrix_insertion(n_cells=10, n_days=30):
   ...: 
   ...:     val = np.empty((n_days, n_cells))
   ...:     for day in np.arange(n_days):
   ...:         val[day, :] = np.arange(n_cells)
   ...: 
   ...:     return np.sum(val, axis=0)
   ...: 

In [3]: 
   ...: assert np.allclose(stacking_arrays(),  matrix_insertion())
   ...: 
   ...: number = 1000

In [4]: np.min(timeit.repeat("stacking_arrays()", globals=globals(), number=number))
Out[4]: np.float64(0.02308262512087822)

In [5]: np.min(timeit.repeat("matrix_insertion()", globals=globals(), number=number))
Out[5]: np.float64(0.0157989589497447)

In [6]: np.min(timeit.repeat("stacking_arrays(n_cells=1000)", globals=globals(), number=number))
Out[6]: np.float64(0.05126216681674123)

In [7]: np.min(timeit.repeat("matrix_insertion(n_cells=1000)", globals=globals(), number=number))
Out[7]: np.float64(0.03652237495407462)

In [8]: np.min(timeit.repeat("stacking_arrays(n_cells=5000)", globals=globals(), number=number))
Out[8]: np.float64(0.2828623750247061)

In [9]: np.min(timeit.repeat("matrix_insertion(n_cells=5000)", globals=globals(), number=number))
Out[9]: np.float64(0.13297262508422136)

Describe the solution you'd like

Switch to matrix insertion, I think.