Open atedstone opened 4 years ago
This is currently not supported, but not against rioxarray adding support for it. The reason it currently does not exist is that it adds a bit of complexity when you have more than 3 dimensions: https://gdal.org/drivers/raster/netcdf.html#dimension
The dimensions inside the NetCDF file use the following rules: (Z,Y,X). If there are more than 3 dimensions, the driver will merge them into bands. For example if you have an 4 dimension arrays of the type (P, T, Y, X). The driver will multiply the last 2 dimensions (P*T). The driver will display the bands in the following order. It will first increment T and then P. Metadata will be displayed on each band with its corresponding T and P values.
Until support is added, I recommend opening your file with xarray.open_dataset
. If this is something you would like to tackle, feel free to take a stab at it.
I think this would be simplified if rasterio added support for multidimensional arrays: https://gdal.org/drivers/raster/netcdf.html#multidimensional-api-support, however support for this hasn't been added yet and it doesn't appear to be on the roadmap.
Thanks for your informative replies. I'll think about next steps - I'd forgotten the nuances of GDAL in only supporting a subset of netCDF functionality.
This driver is intended only for importing remote sensing and geospatial datasets in form of raster images. If you want explore all data contained in NetCDF file you should use another tools. https://gdal.org/drivers/raster/netcdf.html
The background here is that I want to use rioxarray's clip() functionality on this and other netCDF files. But perhaps rioxarray is not the best way to achieve this.
The background here is that I want to use rioxarray's clip() functionality on this and other netCDF files. But perhaps rioxarray is not the best way to achieve this.
There is some good news and some bad news here.
clip()
and other functionality of rioxarray
should work regardless of how you open the file. You just have to import rioxarray
for the extension to be loaded.rioxarray
doesn't do much for you in those scenarios.Looks like your x and y coordinates are in some kind of projected coordinate system:
>>> xds.X10_153
<xarray.DataArray 'X10_153' (X10_153: 144)>
array([-760., -750., -740., -730., -720., -710., -700., -690., -680., -670.,
-660., -650., -640., -630., -620., -610., -600., -590., -580., -570.,
-560., -550., -540., -530., -520., -510., -500., -490., -480., -470.,
-460., -450., -440., -430., -420., -410., -400., -390., -380., -370.,
-360., -350., -340., -330., -320., -310., -300., -290., -280., -270.,
-260., -250., -240., -230., -220., -210., -200., -190., -180., -170.,
-160., -150., -140., -130., -120., -110., -100., -90., -80., -70.,
-60., -50., -40., -30., -20., -10., 0., 10., 20., 30.,
40., 50., 60., 70., 80., 90., 100., 110., 120., 130.,
140., 150., 160., 170., 180., 190., 200., 210., 220., 230.,
240., 250., 260., 270., 280., 290., 300., 310., 320., 330.,
340., 350., 360., 370., 380., 390., 400., 410., 420., 430.,
440., 450., 460., 470., 480., 490., 500., 510., 520., 530.,
540., 550., 560., 570., 580., 590., 600., 610., 620., 630.,
640., 650., 660., 670.], dtype=float32)
Coordinates:
* X10_153 (X10_153) float32 -760.0 -750.0 -740.0 -730.0 ... 650.0 660.0 670.0
Attributes:
units: km
long_name: x
point_spacing: even
axis: X
In this scenario, if you know what the CRS is of the projected data, you could use rio.clip
or rio.clip_box
if you assign the CRS to the dataset using rio.write_crs
first.
Excellent. After some more digging, I've also seen your Stack Exchange answer which shows the use of write_crs in more detail. I can see about adding a page to the rasterio documentation which makes it clear that existing xarray Datasets can have a CRS added post-hoc.
The model outputs that I've linked to in this issue do indeed have some peculiar properties, amongst which are insufficient attributes for most software to understand their georeferencing; units in kms; and X and Y names set based on clipping of the model domain in post-processing. So it's probably unreasonable to expect such files to ever be read in automatically by rioxarray.
I think this is related, but in this case these "4D arrays" are degenerate in z and t, they are really 2D grids with a time stamp. There's a different error for the remote file vs local:
## remote
import rioxarray
dsn = "NETCDF:/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202202/oisst-avhrr-v02r01.20220218.nc:sst"
rioxarray.open_rasterio(dsn)
KeyError: 'NETCDF_DIM_time_VALUES'
## local
import rioxarray
dsn = "NETCDF:oisst-avhrr-v02r01.20220218.nc:sst"
rioxarray.open_rasterio(dsn)
ValueError: coordinate zlev has dimensions ('zlev',), but these are not a subset of the DataArray dimensions ('time', 'y', 'x')
I'm certainly interested to have this case work in rioxarray and will explore it. Would it be acceptable to allow these? I've been down this path before (in commercial and open software) where the GDAL band metadata is used to re-store the dimensionality, but it's a mess and I don't think it's worth it given the xarray facilities and GDAL multidim that exist now.
I'm not interested in the multidimensional possibilities, I just want that obvious case where sequential bands or sequential files are lined up as "time", or perhaps "z". (If rioxarray won't work with them I'll probably make a copy as COGs and use that).
To show that GDAL sees these degenerate dims:
dataset = "/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202202/oisst-avhrr-v02r01.20220218.nc"
from osgeo import gdal
gdal.Open(dataset).GetMetadata("SUBDATASETS")
gdal.Open(dataset).GetMetadata("SUBDATASETS")["SUBDATASET_1_DESC"]
'[1x1x720x1440] sst (16-bit integer)'
rasterio works with them fine
from rasterio.windows import Window
import rasterio
dsn = "NETCDF:oisst-avhrr-v02r01.20220218.nc:sst"
ds = rasterio.open(dsn)
ds.read(window = Window(0, 0, 2, 5))
dsn = "NETCDF:/vsicurl/https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/v2.1/access/avhrr/202202/oisst-avhrr-v02r01.20220218.nc:sst"
ds = rasterio.open(dsn)
ds.read(window = Window(0, 0, 2, 5))
sure enough, so at line 1234 of rioxarray/rioxarray/_io.py I see
(1, 720, 1440)
dict_keys(['time', 'zlev', 'x', 'y'])
<xarray.IndexVariable 'zlev' (zlev: 1)>
array([0])
<xarray.IndexVariable 'time' (time: 1)>
array([16119])```
so, I think in this case it could unproblematically drop the zlev, but I can't see yet how to identify that it's the dim being left out. Will keep pursuing, I'm almost entirely unexperienced with python debugging but happy with progress so far 🙏
candidate fix here, at least for my issue: https://github.com/mdsumner/rioxarray/tree/fix-degen-dims
it works for the file locally, but there's still that error for the remote dsn which I'll continue with.
if reasonable I'll PR it - I think the linting is right, it will take me a while longer to get tests going. Thanks!
note that since GDAL 3.7 we can add this config to get the crs assumed under reasonable situations:
import rioxarray
import rasterio
dsn = "NETCDF:oisst-avhrr-v02r01.20220218.nc:sst"
with rasterio.Env(GDAL_NETCDF_ASSUME_LONGLAT="YES"):
x = rioxarray.open_rasterio(dsn)
x.spatial_ref
<xarray.DataArray 'spatial_ref' ()>
array(0)
Coordinates:
spatial_ref int64 0
Attributes:
crs_wkt: GEOGCS["WGS 84 (CRS84)",DATUM["WGS_1984",SP...
...
@atedstone is it possible to get your file listed above still? I'd like to explore
@mdsumner The file I linked to above is no longer available, but this one is broadly equivalent:
ftp://ftp.climato.be/fettweis/MARv3.14/Greenland/ERA5-15km/MARv3.14.0-15km-daily-ERA5-2023.nc
I think this is related, but in this case these "4D arrays" are degenerate in z and t, they are really 2D grids with a time stamp.
This specific use case should be simple to support as it removed the complexity mentioned here: https://github.com/corteva/rioxarray/issues/174#issuecomment-713666696
However, it would need to throw errors if the extra dimensions have a size greater than 1 until the reading capability is updated to support that.
I also encountered this issue. Since I have a very small dataset with 1D coordinates (but 4 dimensions), this simple reproducible example might help.
import rioxarray
import xarray as xr
file_nc = "cmems_so_2022-11-02.nc"
file_nc_reduced = file_nc.replace(".nc","_reduced.nc")
# create a dataset with the depth dimension reduced (only time/latitude/longitude remain)
ds = xr.open_dataset(file_nc)
ds_reduced = ds.max(dim="depth")
ds_reduced.to_netcdf(file_nc_reduced)
# this reduced dataset works
raster_red = rioxarray.open_rasterio(file_nc_reduced)
# the original dataset raises "ValueError: conflicting sizes for dimension 'time': length 50 on the data but length 1 on coordinate 'time'"
raster_full = rioxarray.open_rasterio(file_nc)
What might at least be useful is to improve the ValueError, since to me it was not clear that there was one dimension too much. It looked more like a mismatch in dimension orders. After trying transposing and swap_dims, I got the idea to reduce the depth dimension (with max, which is arbitrary in this case).
The ds
looks like this:
<xarray.Dataset>
Dimensions: (depth: 50, latitude: 11, longitude: 10, time: 1)
Coordinates:
* depth (depth) float32 0.494 1.541 2.646 ... 5.275e+03 5.728e+03
* latitude (latitude) float32 -1.583 -1.5 -1.417 ... -0.9167 -0.8333 -0.75
* longitude (longitude) float32 116.4 116.5 116.6 116.7 ... 117.0 117.1 117.2
* time (time) datetime64[ns] 2022-11-02
Data variables:
so (time, depth, latitude, longitude) float32 nan nan ... nan nan
The ds_reduced
looks like this:
<xarray.Dataset>
Dimensions: (latitude: 11, longitude: 10, time: 1)
Coordinates:
* latitude (latitude) float32 -1.583 -1.5 -1.417 ... -0.9167 -0.8333 -0.75
* longitude (longitude) float32 116.4 116.5 116.6 116.7 ... 117.0 117.1 117.2
* time (time) datetime64[ns] 2022-11-02
Data variables:
so (time, latitude, longitude) float32 nan nan 32.86 ... nan nan nan
Example dataset (extension changed to enable uploading): cmems_so_2022-11-02.nc.txt
Code Sample
Example file: ftp://ftp.climato.be/fettweis/MARv3.11/Greenland/ERA_1958-2019-10km/daily_10km/MARv3.11-20km-daily-ERA-10km-2016.nc
Problem description
open_rasterio
fails to load subdatasets (DataArrays?) from NetCDFs with more than 1 additional dimension to X, Y. Expressed alternatively, the maximum number of dimensions is 3 (e.g. TIME, y, x). This failure halts all further loading of the NetCDF file, in effect rendering the NetCDF unopenable byopen_rasterio
.Expected Output
With pure xarray:
Environment Information
Installation method
conda
Conda environment information (if you installed with conda):
Environment (
conda list
):Details about
conda
and system (conda info
):