Closed JoerivanEngelen closed 7 months ago
Tested with both examples from #134 and #86 and they both work. However, I do notice odd behaviour. When using xu.open_mfdataset
the merging takes 0.3 seconds. When using xu.open_dataset
the merging takes >260 seconds:
import glob
import xugrid as xu
import datetime as dt
file_nc = r'p:\1230882-emodnet_hrsm\GTSMv5.0\SO_NHrivGTSM\computations\BD014_fix_mapformat4\output\gtsm_model_0*_map.nc'
file_nc_list = glob.glob(file_nc)
partitions = []
for iF, file_nc_one in enumerate(file_nc_list):
uds_part = xu.open_mfdataset(file_nc_one)
partitions.append(uds_part)
dtstart = dt.datetime.now()
uds = xu.merge_partitions(partitions)
print(f'merging took: {(dt.datetime.now()-dtstart).total_seconds():.2f} sec')
Using xarray 2023.11.0
I guess you can also try loading the datasets with just xr.open_dataset
, then turning them into UgridDatasets, then merging them.
That will give the same result, I'm fairly sure. The xugrid.open_mfdataset
just adds the data_vars kwarg then calls xr.open_mfdataset
.
From the source, parallel=False
by default in the mf_opendataset call and that triggers some dask usage or not -- so that shouldn't make a difference here. But maybe there's something more subtle going on as well.
Is there maybe a difference between the eagerness? If you call .compute()
in the timing, are the timings the same?
@Huite, thanks for your suggestions. Indeed, the combination of ds=xr.open_dataset()
and xu.UgridDataset(ds)
behaves the same. And your second suggestion, to call .compute()
significantly increases the performance indeed (0.8 sec). This also triggered me to remember the importance of the chunks argument (chunks={"time":1}
is the default in dfm_tools), this also worked as a charm (0.5 sec). So I guess we can conclude there was no additional problem introduced with this PR, and the PR also provides what it aims to do.
Great job!
Fix #134 and #86,
Add support for merging partitions with different grids in each partition, for example 1D grid in one partition and 2D grids in each. Quite some refactoring was required: