Deltares / xugrid

Xarray and unstructured grids
https://deltares.github.io/xugrid/
MIT License
58 stars 8 forks source link

nan fillvalue attributes written by xarray #176

Open veenstrajelmer opened 9 months ago

veenstrajelmer commented 9 months ago

I noticed that xugrid writes nan fillvalue attributes for variables without fillvalue defined. After a quick assessment I see that this is xarray behaviour and it is also happening in their code. A nan fillvalue does not make sense to me and it causes issues in some cases, like this one. In dfm_tools there is a workaround in place that pops nan fillvalue attrs upon reading, but I believe we should avoid writing them. It might be related to xarray not decoding default fillvalues if no fillvalue attribute is present. Either way, let's discuss what is expected/desired behaviour here.

The code below reads a D-FlowFM output which has no nan fillvalue attributes. When writing it with .to_netcdf() there are nan fillvalue attributes. I have tried this also with xugrid testdata, but those datasets already contain nan fillvalue attributes upon reading:

import dfm_tools as dfmt
import xarray as xr
import numpy as np

file_nc = dfmt.data.fm_curvedbend_map(return_filepath=True)
file_out = "temp_fillvals_map.nc"
ds_org = xr.open_dataset(file_nc)
ds_org.to_netcdf(file_out)
ds_out_xr = xr.open_dataset(file_out)

print("nan fillvalue attrs in original dataset:")
ds = ds_org
count_xr = 0
for varn in ds.variables:
    if '_FillValue' in ds[varn].encoding:
        if np.isnan(ds[varn].encoding['_FillValue']):
            print(varn, ds[varn].encoding['_FillValue'])
            count_xr += 1

print()

print("nan fillvalue attrs in dataset written by xugrid/xarray:")
ds = ds_out_xr
count_xr = 0
for varn in ds.variables:
    if '_FillValue' in ds[varn].encoding:
        if np.isnan(ds[varn].encoding['_FillValue']):
            print(varn, ds[varn].encoding['_FillValue'])
            count_xr += 1
JoerivanEngelen commented 6 months ago

Just checking @veenstrajelmer, does https://github.com/pydata/xarray/pull/8713 fix this issue?

veenstrajelmer commented 6 months ago

I do not think so, this is about (lack of) precision of scale_factor attrs and is unrelated to {"_FillValue":np.nan} in the netcdf file attributes. I found the list of issues @Huite and myself collected a while ago. They are listed in this issue: https://github.com/Deltares/dfm_tools/issues/490. This one also lists another nan-fillvalue issue (https://github.com/Deltares/xugrid/issues/125), which we closed as "wont do". I guess it makes sense to do the same with this issue.

kmuehlbauer commented 6 months ago

@JoerivanEngelen @veenstrajelmer Just stepping by from https://github.com/pydata/xarray/pull/8713. Correct, that PR won't fix this particular issue. But I'm aiming for solving the default fillvalue issue over at xarray, too. This is moving in small steps, though.

veenstrajelmer commented 6 months ago

@kmuehlbauer thank you for clarifying this. We were considering to fix it in this package (prevent writing of _Fillvalue:np.nan attributes and add encoding of default fillvalues per datatype if no _FillValue is supplied, both in line with the cf-convention. Do you mean that this behaviour will in the future be taken care of by xarray? In that case we can indeed consider to temporarily put it in xugrid to anticipate future behaviour.