Is your feature request related to a problem? Please describe.
At the moment, the hydrology model tracks daily values for variables within a time step by appending daily arrays to a list and then using np.stack to combine them at the end to get statistics across the time step.
Since the number of days is known (and will always be known?), this can be done by declaring a days x grid cell numpy array and inserting daily rows. If the code below is a fair comparison of the two approaches then this is faster and cleaner. I have to admit I was expecting it to be more faster, but it is faster and the advantage greater as the number of cells increases.
In [1]: import timeit
...: import numpy as np
...:
...:
...: def stacking_arrays(n_cells=10, n_days=30):
...:
...: val = []
...: for day in np.arange(n_days):
...: val.append(np.arange(n_cells))
...:
...: return np.sum(np.stack(val, axis=1), axis=1)
...:
In [2]: def matrix_insertion(n_cells=10, n_days=30):
...:
...: val = np.empty((n_days, n_cells))
...: for day in np.arange(n_days):
...: val[day, :] = np.arange(n_cells)
...:
...: return np.sum(val, axis=0)
...:
In [3]:
...: assert np.allclose(stacking_arrays(), matrix_insertion())
...:
...: number = 1000
In [4]: np.min(timeit.repeat("stacking_arrays()", globals=globals(), number=number))
Out[4]: np.float64(0.02308262512087822)
In [5]: np.min(timeit.repeat("matrix_insertion()", globals=globals(), number=number))
Out[5]: np.float64(0.0157989589497447)
In [6]: np.min(timeit.repeat("stacking_arrays(n_cells=1000)", globals=globals(), number=number))
Out[6]: np.float64(0.05126216681674123)
In [7]: np.min(timeit.repeat("matrix_insertion(n_cells=1000)", globals=globals(), number=number))
Out[7]: np.float64(0.03652237495407462)
In [8]: np.min(timeit.repeat("stacking_arrays(n_cells=5000)", globals=globals(), number=number))
Out[8]: np.float64(0.2828623750247061)
In [9]: np.min(timeit.repeat("matrix_insertion(n_cells=5000)", globals=globals(), number=number))
Out[9]: np.float64(0.13297262508422136)
Is your feature request related to a problem? Please describe.
At the moment, the hydrology model tracks daily values for variables within a time step by appending daily arrays to a list and then using
np.stack
to combine them at the end to get statistics across the time step.https://github.com/ImperialCollegeLondon/virtual_ecosystem/blob/68129075e9bd03f77d612eb5b6b2447d49a59aef/virtual_ecosystem/models/hydrology/hydrology_model.py#L390-L408
Since the number of days is known (and will always be known?), this can be done by declaring a days x grid cell numpy array and inserting daily rows. If the code below is a fair comparison of the two approaches then this is faster and cleaner. I have to admit I was expecting it to be more faster, but it is faster and the advantage greater as the number of cells increases.
Describe the solution you'd like
Switch to matrix insertion, I think.