corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
504 stars 80 forks source link

All multiple nodata values to be passed to reproject #720

Open andypbarrett opened 6 months ago

andypbarrett commented 6 months ago

For datasets with several data variables of different datatypes, it would be helpful to set a nodata value for each data variable.

Current Behavior The nodata keyword argument for rio.reproject accepts a single value. If nodata == None, then a default value is used based on data type.

Suggested Behavior Allow nodata to accept a scalar or a dict, where the dict is {'var1': nodata_value_var1, 'var2': nodata_value_var2}.

The for-loop in rioxarray.raster_dataset.reproject would then check for a nodata value for the data variable.

for var in self.vars:
    <snip>
    if isinstance(nodata, dict):
         nodata_val = nodata.get(var)
    else:
        nodata_val = nodata
    x_dim, y_dim = _get_spatial_dims(self._obj, var)
    resampled_dataset[var] = (
                    self._obj[var]
                    .rio.set_spatial_dims(x_dim=x_dim, y_dim=y_dim, inplace=True)
                    .rio.reproject(
                        dst_crs,
                        resolution=resolution,
                        shape=shape,
                        transform=transform,
                        resampling=resampling,
                        nodata=nodata_val,
                        **kwargs,
                    )
    <snip>
snowman2 commented 6 months ago

This is the recommended approach for setting the nodata values: https://corteva.github.io/rioxarray/stable/getting_started/nodata_management.html

andypbarrett commented 6 months ago

Thank you @snowman2 . I take your point. Ideally, data files will have the nodata values set correctly. This is not always the case. While using one of the recommended methods is preferable, it adds an extra layer to simple workflows.

As reproject allows nodata to be set as a keyword anyway, then it would be useful if the nodata keyword covered the case where a dataset has variables with different nodata values.

Happy to submit a PR.

snowman2 commented 6 months ago

A PR would be welcome!