cryotools / cosipy

Coupled snowpack and ice surface energy and mass balance model in Python
GNU General Public License v3.0
52 stars 30 forks source link

Computational Inefficiency - Subsurface Depth #69

Closed MarcusGastaldello closed 1 month ago

MarcusGastaldello commented 1 month ago

The code used by COSIPY to calculate the depth of subsurface layers is very inefficient, potentially heavily impacting model performance and runtime. By this I refer to lines 920 - 935 of 'cosipy/cpkernel/grid.py'.

Code

This is effectively an exponential function that compounds as the user uses an increasingly fine mesh with more layers, becoming extremely slow. For a subsurface mesh of n layers:

Loop iterations = n (n + 1) /2

Graph

Ultimately, the single action of calling 'get_depth' in the 'cosipy/modules/penetratingRadiation.py' or 'cosipy/cpkernal/cosipy_core.py' modules can take more time than all other processes in a simulation timestep combined.

The depth array can be much more efficently calculated using the numpy cumsum method (see below):

Example: a subsurface grid of 300 layers of height 0.15 metres is used:

Depth_Calculation

These time values may sound trivial, but they can be come very significant over long, high temporal resolution simulations.

gampnico commented 1 month ago

You read my mind! I already have a fix in the pipeline for this issue which I haven't yet pushed:

def get_node_depth(self, idx: int):
    d = self.get_node_height(idx) / 2.0
    if idx != 0:
        for i in range(idx):
            d = d + self.get_node_height(i)
    return d

This eliminates all the redundant indexing from the original code.

I'll profile your np.cumsum method as a replacement for get_depth, as your solution reduces the number of calls to get_node_height(). The get methods are njitted, and sometimes njitted pure python is faster than njitted numpy as the methods pass lists from an OrderedDict, not numpy arrays.

gampnico commented 1 month ago

I rewrote your solution in pure python, and got faster times at lower layer numbers when njitted, but the difference is minimal compared to the slowdown for larger layers. I prefer your numpy solution. Figure_1

I've opted for:

def get_depth(self):
        h = np.array(self.get_height())
        z = np.array(self.get_height())
        z[0] = 0.5 * h[0]
        z[1:] = np.cumsum(h[:-1]) + 0.5 * h[1:]
        return [z[idx] for idx in range(self.number_nodes)]

This also makes get_node_depth redundant. I'll push this to the release branch. Thank you!

MarcusGastaldello commented 1 month ago

Thanks - I'm not too familiar with how Numba works but it's interesting to see how it can perform as well if not better than Numpy in some cases.

I just found with my simulations (that call this function an additional 4 times for new features I added), that this change incredibly reduced the simulation time by 80%. I am often running an excess of 300 layers though which may explain why I was more sucsceptible to this. But it would be interesting to know if the standard COSIPY model's performance has changed much with this change.