ec-jrc / pyPoseidon

Framework for Hydrodynamic simulations
https://pyposeidon.readthedocs.io/
European Union Public License 1.2
20 stars 8 forks source link

Reduce peak memory usage for meteo #138

Closed pmav99 closed 1 year ago

pmav99 commented 1 year ago

When processing ECMWF High Resolution meteo data the peak memory usage is ~20GB. After applying this patch memory usage falls to <13GB.

diff --git a/pyposeidon/schism.py b/pyposeidon/schism.py
index 0a491ae..1a9645b 100644
--- a/pyposeidon/schism.py
+++ b/pyposeidon/schism.py
@@ -292,7 +292,9 @@ class Schism:

         xx, yy = np.meshgrid(ar.longitude.data, ar.latitude.data)

-        zero = np.zeros(ar[p].data.shape)
+        import dask
+        zero = dask.array.zeros(ar[p].data.shape)

         date = kwargs.get("date", ar.time[0].data)

We need to test if the netcdf that gets produced can be used by schism or not.

brey commented 1 year ago

This works. Additional way to reduce memory peak is to split the files using

meteo_split_by="day"
pmav99 commented 1 year ago

Does it have any runtime performance implications?

brey commented 1 year ago

For SCHISM not AFAIK. For pyposeidon time wise I get:

with slicing 38min 35s wall time

without 59min 7s wall time

That is on my iMac writing to a USB external disk!