Speed up calculation of the `SUM_OF_COEFFICIENTS_TO_DIFFUSION_TERM`

EcohydrologyTeam / ClearWater-riverine

A 2D water quality transporter model to calculate conservative advection and diffusion of constituents from an unstructured grid of flows

MIT License

6 stars 0 forks source link

Speed up calculation of the `SUM_OF_COEFFICIENTS_TO_DIFFUSION_TERM` #49

Closed sjordan29 closed 6 months ago

sjordan29 commented 7 months ago

This is calculated using the _calc_sum_coeff_to_diffusion_term function and _sum_vals, which currently loops over time. As the number of timesteps is increasing, this is becoming extremely slow (>3 min to instantiate a model). We should investigate how to vectorize this calculation over the time dimension.

sjordan29 commented 7 months ago

So far I have taken the following steps:

Update _calc_sum_coeff_to_diffusion_term() to use np.add.at, a ufunc, rather than looping over time. This sped up the entire process of model instantiation 3x for the 3-day simulation (over 3 minutes to ~1 minute).
As I was profiling the code, I realized that setting the time coordinate was also taking a long time due to list comprehension over a list of the length of the number of timesteps. I first tried using np.vectorize to achieve the same result, but found that leveraging pandas actually worked best for this use case. Sped up the read_ras() part of the code by 4x (~24 seconds to 6 seconds).

Proposed next step: We could potentially move the sum of the coefficients to diffusion calculation to the LHS calculation in the linalg module so that we are not looping through time twice.

sjordan29 commented 7 months ago

Found that integrating the sum_of_coefficients_to_diffusion_term function into the LHS matrix calculation was slower overall. It did speed up model initialization considerably, but that time was more than added back in throughout the model simulation.

Key findings:

while still slow, initializing the entire ndarray over time when initializing the model saves time in the long run compared to calculating it at each timestep.
np.bincount() was faster for this operation than np.add.at()

Next steps:

Attempt to achieve the results of this function within within the LHS function by leveraging inherent functionality of csr_matrix function used to solve the matrix.

sjordan29 commented 7 months ago

Calculating the sum of the coefficients to the diffusion term within the LHS class's update_values method turned out to be the best solution. The LHS solver tracks the row and column values where values should be placed in the matrix and the values that should be placed in each location. When we leverage the csr_matrix, it does the work to combine values that fall within the same matrix cell for us. This eliminates the need for the sum_of_coefficients_to_diffusion_term method entirely.

Results:

Instantiation of the 3-day model dropped from ~1 minute (already down from ~3 minutes from other changes made in this PR) to 8 seconds.
When running the 3-hour model at 1-second timesteps, model instantiation dropped from 1.95 seconds to 0.82 seconds and model runtime was not affected.