Deltares / dfm_tools

A Python package for pre- and postprocessing D-Flow FM model input and output files
https://deltares.github.io/dfm_tools/
GNU General Public License v3.0
70 stars 12 forks source link

add kwargs to wrapper functions for xarray (and possibly more) #865

Closed SCLaan closed 5 months ago

SCLaan commented 5 months ago

I tried to generate boundary conditions based on a set of downloaded netCDFs with CMEMS data. The data was downloaded in multiple batches and the exact domain has changed a little bit over time, either due to the input in the script or possibly adjustments to the buffer zone in dfm_tools. This caused an error: ValueError: cannot align objects with join='exact' where index/labels/sizes are not equal along these coordinates (dimensions): 'longitude' ('longitude',)

I solved this by changing the hard-coded statement in https://github.com/Deltares/dfm_tools/blob/5ca89150156ce0cddd9a8fca1dfba9481a569643/dfm_tools/interpolate_grid2bnd.py#L294 and now use join='inner'

This made me think it would be good to

In this case this would come down to kwargs being accepted in https://github.com/Deltares/dfm_tools/blob/5ca89150156ce0cddd9a8fca1dfba9481a569643/dfm_tools/modelbuilder.py#L21 and then in the underlying https://github.com/Deltares/dfm_tools/blob/5ca89150156ce0cddd9a8fca1dfba9481a569643/dfm_tools/interpolate_grid2bnd.py#L257

SCLaan commented 5 months ago

Just to document my changes. I had to made some additional changes to interpolate_grid2bnd.py, namely:

data_xr = xr.open_mfdataset(file_list_nc, chunks=chunks, join='inner',
                            decode_coords="all", preprocess=drop_duplicates_along_all_dims)

where I added the preprocessing function drop_duplicates_along_all_dims to remove errors for duplicate (fill) values:

def drop_duplicates_along_all_dims(obj, keep="first"):
    """
    Needed as pre-processing for xr.open_mfdataset() to ignore duplicate empty values in datasets.
    """

    deduplicated = obj
    for dim in obj.dims:
        indexes = {dim: ~deduplicated.get_index(dim).duplicated(keep=keep)}
        deduplicated = deduplicated.isel(indexes)
    return deduplicated
veenstrajelmer commented 5 months ago

I understand why you would want this, but join="exact" was added deliberately for robustness reasons to fix https://github.com/Deltares/dfm_tools/issues/574. I favour robustness over flexibility to avoid successfully creating a model that produces unexpected output or errors. Your suggested change could work in your case but it increases the code complexity significantly and I think there are two acceptable alternatives:

If you run the testcase that was added in the PR linked to that issue I expect it would fail after your changes. You can also add your changes to a branch and open a PR, the testbank will then automatically run. I am curious what comes out, but I will for now close this issue as won't do. Let's discuss if you think this should be re-considered.