EcohydrologyTeam / ClearWater-riverine

A 2D water quality transporter model to calculate conservative advection and diffusion of constituents from an unstructured grid of flows
MIT License
6 stars 0 forks source link

Memory management improvements #61

Open sjordan29 opened 9 months ago

sjordan29 commented 9 months ago

I used memory_profiler for instantiation of Clearwater-riverine and running a few timesteps. The following were interesting findings:

cw-r-10s

Next steps:

aufdenkampe commented 9 months ago

@sjordan29, this is great stuff. Thanks for finding and figuring out how to use memory_profiler!

I created this complementary issue that documents some of the approaches we discussed this week.

sjordan29 commented 9 months ago

A few key findings from the commit above:

  1. Unsurprisingly, variables with dimensions of ('time', 'nedge') take up the most memory (the number of edges will always be greater than the number of faces (cells))
  2. coeff_to_diffusion has the largest memory of all the variables, even though it has the same dimensions ('time', 'nedge') as the others. this is because it's a float64 whereas the others are float32 (I think because they are read directly from the HDF file and that must be the precision of the RAS output).
  3. Leveraging dask and lazy loading appears to have trade-offs between memory and time. It was faster to sel data from the xarray in memory than from a lazy-loaded xarray (when loading the values).
  4. I still need to do some exploration of the lazy writing. Not currently working as I had expected.