Deltares / dfm_tools

A Python package for pre- and postprocessing D-FlowFM model input and output files
https://deltares.github.io/dfm_tools/
GNU General Public License v3.0
65 stars 11 forks source link

Profile xr.open_dataset() for large mapfiles and hisfiles #225

Open veenstrajelmer opened 1 year ago

veenstrajelmer commented 1 year ago

Opening large mapfiles takes quite some time, this might for instance be because of the decoding of time etc. This could also be done only once, after merging of the mapfile. However, the stuff that takes time is cached so the second opening is more than 10 times faster. Beware the performance for the second opening does not get less.

Some timings:

veenstrajelmer commented 1 year ago

Setting decode_times=False does not make a difference, while a profiler showed it took half the time of xr.open_dataset()

Next, try out different chunkings. Currently chunks={'time':1}, which is already way faster than providing no chunks argument, but consumes large amount of memory when reducing. Why is 1 faster to begin with? It is the chunking in the file also, but is that not read automatically also?

Besides the impact on open_dataset, does chunking impact other steps, like merging, plotting? Memory impact of reductions is impacted, how about performance?

veenstrajelmer commented 1 year ago

There is also https://github.com/Deltares/dfm_tools/issues/583 that asesses performance of a large hisfile

veenstrajelmer commented 9 months ago

Mayve consider installing some dependencies to improve performance: https://github.com/pydata/xarray/issues/8035#issuecomment-1660516949