fmaussion / salem

Add geolocalised subsetting, masking, and plotting operations to xarray
http://salem.readthedocs.io
Other
179 stars 43 forks source link

dataset Grid not understood #100

Open eotp opened 6 years ago

eotp commented 6 years ago

I am trying to apply the .subset function on a xarray.core.dataset.Dataset object. However, I get RuntimeError: dataset Grid not understood. I am able to reproduce the example from the documentation (here), however, when providing another data set I get the error.

Here is a snippet of my data: test_data.zip

Here my code:

import xarray as xr
import salem

ds = xr.open_dataset("./data/test_data.nc")
ds
<xarray.Dataset>
Dimensions:                 (nlat: 85, nlon: 63, time: 3)
Coordinates:
  * nlon                    (nlon) float32 -82.375 -82.125 -81.875 -81.625 ...
  * nlat                    (nlat) float32 -20.125 -19.875 -19.625 -19.375 ...
  * time                    (time) datetime64[ns] 2004-12-31 2005-01-31 ...
Data variables:
    precipitation           (time, nlat, nlon) float32 ...
    relativeError           (time, nlat, nlon) float32 ...
    gaugeRelativeWeighting  (time, nlat, nlon) int32 ...
Attributes:
    Grid.GridHeader:  BinMethod=ARITHMETIC_MEAN;\nRegistration=CENTER;\nLatit...
    FileHeader:       AlgorithmID=3B43;\nAlgorithmVersion=3B43_7.0;\nFileName...
    FileInfo:         DataFormatVersion=m;\nTKCodeBuildVersion=1;\nMetadataVe...
    GridHeader:       BinMethod=ARITHMETIC_MEAN;\nRegistration=CENTER;\nLatit...
    history:          2018-05-08 18:27:22 GMT Hyrax-1.13.4 https://disc2.gesd...
type(ds)
xarray.core.dataset.Dataset
ds_subset = ds.salem.subset(corners=((-80, -15.), (-75., -7.)), crs=salem.wgs84)
RuntimeError: dataset Grid not understood.

Any advice?

fmaussion commented 6 years ago

The name of your lon/lat coordinates wasn't in the list of accepted coords.

Can you try latest master and let me know if that works?

pip install git+https://github.com/fmaussion/salem.git
Irene-GM commented 5 years ago

Hi,

I installed salem with the current master branch as per today, but I am also facing the same Grid not understood error.

To give some context: I have a set of raster files in Lambert Conformal Conic (LCC) coordinate reference system. Since each file is large and has many bands, I am interested in cropping the extent to only my study area to reduce the size of the entire collection. I am really interested in using salem for this task, because it seems that it ships with a subsetting utility salem.subset() which allows to specify the coordinates of the bounding box (and all the transformations/warping that I tried with gdal failed). However, I also hit the same error as the user that opened this issue. I am able to open the file withxarray, and even I can open it with GDAL and plot it with matplotlib, so I was assuming its a well-formed file.

Here I am pasting the netcdf summary after opening it with xarray.

<xarray.Dataset>
Dimensions:            (bnds: 2, time: 745, x: 217, y: 234)
Coordinates:
  * y                  (y) float32 0.0 2500.0 5000.0 7500.0 10000.0 12500.0 ...
  * x                  (x) float32 0.0 2500.0 5000.0 7500.0 10000.0 12500.0 ...
  * time               (time) datetime64[ns] 2013-12-01 2013-12-01T00:30:00 ...
    lon                (y, x) float32 ...
    lat                (y, x) float32 ...
    height             float32 ...
Dimensions without coordinates: bnds
Data variables:
    Lambert_Conformal  |S1 ...
    date               (time) int32 ...
    hms                (time) int32 ...
    time_bnds          (time, bnds) float64 ...
    ugs                (time, y, x) float32 ...
Attributes:
    Conventions:       CF-1.4
    institute_id:      ***** 
    model_id:          ******
    experiment_id:     ******** 
    domain:            NETHERLANDS.NL
    frequency:         
    driving_model_id:  ****
    creation_date:     Tue Feb 13 06:51:45 2018
    title:             *****
    comment:           Created with gl/xtool

Please, is there anything I can do to fix this grid problem? Thanks!

fmaussion commented 5 years ago

Hi @Irene-GM , thanks for the report.

In order for salem to understand the grid of your file, it must be able to parse the projection information out of the file's metadata. Your file seems quite well defined, so it should be possible. Would you mind sharing a small temporal subset out of it with me?

Alternatively, you could define the projection yourself and and give it at runtime as explained in the docs: https://salem.readthedocs.io/en/latest/xarray_acc.html#custom-projection

Irene-GM commented 5 years ago

Hi @fmaussion , thanks for your quick response!

Unfortunately, i cannot share the file with you, since it is experimental data. I tried the trick you suggested in the link and it seems it is not working:

ds = xr.open_dataset(path_in_nc)
ds.attrs['pyproj_srs'] = '+proj=lcc +lat_1=51.967 +lat_0=51.967 +lon_0=4.9 +k_0=1 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs'

# Since I have many bands, I am extracting only one
ugs = ds.ugs.isel(time=225) 

# Shows in matplotlib, FYI
ugs.salem.quick_map(varname="ugs")

Note that :

Lambert_Conformal#grid_mapping_name=lambert_conformal_conic
Lambert_Conformal#latitude_of_projection_origin=51.967
Lambert_Conformal#longitude_of_central_meridian=4.9
Lambert_Conformal#standard_parallel=52.5

So I keep getting the RuntimeError: dataset Grid not understood raised by the program salem/sio.py

Hope this helps, Thanks!

fmaussion commented 5 years ago

What happens if you do:


ds = xr.open_dataset(path_in_nc)

# Since I have many bands, I am extracting only one
ugs = ds.ugs.isel(time=225) 
ugs.attrs['pyproj_srs'] = '+proj=lcc +lat_1=51.967 +lat_0=51.967 +lon_0=4.9 +k_0=1 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs'

# Shows in matplotlib, FYI
ugs.salem.quick_map(varname="ugs")

?

Irene-GM commented 5 years ago

Yes! It worked!

Now I only need to figure out why the raster is shifted with respect to the outline of Europe. I guess it has to do with the lat/lon parameters specified in the SRS, so I should find the correct coordinates of the bounding box in LCC-1SP (any suggestion is welcome, btw). Then I can try the subsetting.

Thanks a lot for finding this workaround for me, much appreciated :-)

fmaussion commented 5 years ago

For the attribute setting problem, see: https://salem.readthedocs.io/en/latest/xarray_acc.html#keeping-track-of-attributes Basically you are going to have to keep track of the attributes, a behavior I would like to change in a future version of salem (https://github.com/fmaussion/salem/issues/62)

Now regarding the shift, there are two things you could look at:

In order to check that salem understands your data well, there is a very simple check:

import numpy as np
salem_lon, salem_lat = ugs.salem.grid.ll_coordinates
# Check that they are the same
np.testing.assert_almost_equal(salem_lon, ugs.lon, atol=1e-4)
np.testing.assert_almost_equal(salem_lat, ugs.lat, atol=1e-4)

The lon/lat coordinates should be similar at the 3rd or 4th decimal

fmaussion commented 5 years ago

If you want me to have a look, please share your data (modified if you want). For example:

ds = xr.open_dataset(path_in_nc)
ds = ds[['lon', 'lat']]
ds.to_netcdf('simplified_data.nc')

would be enough for me and won't hold any sensitive data other than the coordinates...

Craftflow commented 5 years ago

Hello

I am also facing the Grid not understood error. I am using a netcdf file of weather forecasts. I subset my file to only precipitation observations and attempt to set the projection as suggested above. However my script is still failing with the error.

I would include the file but even after compression it is 160mb. I would subset and re-save the file but I am uncertain whether my subsetting may be causing the problem.

Additional note, each file I have is only for one hour of forecasts so the time dimension is purposefully absent.

Here is the xarray output:

` ds = xr.open_dataset(cdf_file, decode_coords=True)

ds
<xarray.DataArray 'TMP_P0_L1_GLC0' (ygrid_0: 1059, xgrid_0: 1799)>
[1905141 values with dtype=float32]
Coordinates:
gridlat_0 (ygrid_0, xgrid_0) float32 ...
gridlon_0 (ygrid_0, xgrid_0) float32 ...
Dimensions without coordinates: ygrid_0, xgrid_0
Attributes:
initial_time: 09/20/2014 (06:00)
forecast_time_units: hours
forecast_time: 1
level: 0.0
level_type: Ground or water surface
parameter_template_discipline_category_number: [0 0 0 0]
parameter_discipline_and_category: Meteorological products, ...
grid_type: Lambert Conformal can be ...
units: K long_name: Temperature
production_status: Operational products
center: US National Weather Servi...
pyproj_srs: +proj=lcc +lat_1=38.2 +la...
`

Here is the code that produced the error.

` ds_srs = '+proj=lcc +lat_1=38.2 +lat_0=38.2 +lon_0=-121 +k_0=1 +x_0=0 +y_0=0 +ellps=WGS84 +units=m +no_defs'

ds_sub = ds['TMP_P0_L1_GLC0']
ds_sub.attrs['pyproj_srs'] = ds_srs
ds_sub.salem.quick_map() `

I would appreciate any insight you may have. Thank you!

fmaussion commented 5 years ago

Hi,

one of the reasons salem can't read your data is that the dimension and coordinate names are not standard (gridlat_0, why a 0?), i.e. I would have to hardcode these names in salem so that it can understand it (or you'd have to rename them to more standard names using xarray beforehand). This would be OK, but the actual problem with your data is following:

Dimensions without coordinates: ygrid_0, xgrid_0

Salem works only with rectilinear grids in the projection space, i.e. it needs the northings and eastings of your grid. Currently you only have the 2D lons and lats of your grid -> with these you can plot things on a map with cartopy or resample them with xesmf, but you can't use salem.

I have written more info about these things scattered on github (https://github.com/pangeo-data/pangeo/issues/356#issuecomment-414617798 or https://github.com/geoxarray/geoxarray/issues/3) but I will write them down in the salem documentation for future reference

Craftflow commented 5 years ago

Thank you for the quick response!

I am also interested in why the grid lat and longs are coded in a non-standard fashion. I have made no changes to the dataset. In fact there appears to be a couple non-standard aspects to this file that are causing me problems. However, with the current government shutdown my source for this information is unavailable.

I am sorry for my lack of experience in working with GIS data, such that I neglected to provide adequate context and information. If my understanding of your response is correct then I do believe this dataset has the northings and eastings, i just failed to include them earlier as they were not understood by xarray and so not reported.

The below information comes from a simple ncdump command output.

' dimensions: ygrid_0 = 1059 ; xgrid_0 = 1799 ;

Corner Coordinates: #from the xgrid and ygrid dimensions. Upper Left ( 0.0, 0.0) Lower Left ( 0.0, 1059.0) Upper Right ( 1799.0, 0.0) Lower Right ( 1799.0, 1059.0) Center ( 899.5, 529.5)

Latitude Longitude of Grid Corners: gridlat_0:corners = 21.138f, 21.14067f, 47.84238f, 47.83844f gridlon_0:corners = -122.72f, -72.29019f, -60.91784f, -134.0961f

Geolocation: LINE_OFFSET=0 LINE_STEP=1 PIXEL_OFFSET=0 PIXEL_STEP=1 SRS=GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]] X_BAND=1 X_DATASET=NETCDF:"1426306000001.nc":gridlon_0 Y_BAND=1 Y_DATASET=NETCDF:"1426306000001.nc":gridlat_0 `

Additionally, I know that each gridpoint is exactly 3km by 3km. I also included the geolocation information as I am unsure whether it may be relevant to this discussion.

Given this information does it now make sense to try to utilize Salem?

hydrogeohc commented 5 years ago

I have the same issue. Could I ask you @fmaussion to check my data?

sbr-5260 commented 2 years ago

I am also having the "dataset grid not understood" error.

I have many time dimensions, and I am trying to create a composite map of the PBLH height, using this code. As far as I can see, the coordinates have remained the same from the original dataset, so I am not sure why I am getting this error.

twentyfour = ds.PBLH.groupby('time.hour').mean().isel(hour = 23)

<xarray.DataArray 'PBLH' (south_north: 5, west_east: 5)> dask.array<getitem, shape=(5, 5), dtype=float32, chunksize=(5, 5), chunktype=numpy.ndarray> Coordinates: lat (south_north, west_east) float32 -37.14 -37.14 ... -37.13 lon (south_north, west_east) float32 174.5 174.5 ... 174.5 174.5

fmaussion commented 2 years ago

@sbr-5260 how did you create ds? it's likely that the ds.PBLH.groupby('time.hour').mean().isel(hour = 23) operation is discarding the attributes that salem needs to construct the grid.

sbr-5260 commented 2 years ago

Hi fmaussion,

Here is my ds dataset:

Dimensions: (south_north: 168, west_east: 168, time: 109, bottom_top: 40, soil_layers: 4) Coordinates: lat (south_north, west_east) float32 -37.14 -37.14 ... -36.67 lon (south_north, west_east) float32 174.5 174.5 ... 175.1 175.1 * time (time) datetime64[ns] 2019-06-26T12:00:00 ... 2019-07-01 * west_east (west_east) float64 -2.595e+04 -2.565e+04 ... 2.415e+04 * south_north (south_north) float64 -2.145e+04 -2.115e+04 ... 2.865e+04 keep_attrs bool True Dimensions without coordinates: bottom_top, soil_layers Data variables: (12/145) LU_INDEX (time, south_north, west_east) float32 dask.array ZNU (time, bottom_top) float32 dask.array ZNW (time, bottom_top) float32 dask.array ZS (time, soil_layers) float32 dask.array DZS (time, soil_layers) float32 dask.array VAR_SSO (time, south_north, west_east) float32 dask.array ... ... pressure (time, bottom_top, south_north, west_east) float32 dask.array temperature (time, bottom_top, south_north, west_east) float32 dask.array wind_speed (time, bottom_top, south_north, west_east) float32 dask.array wind_speed_10m (time, south_north, west_east) float32 dask.array wind_dir_10m (time, south_north, west_east) float32 dask.array wind_dir (time, bottom_top, south_north, west_east) float32 dask.array Attributes: (12/87) TITLE: OUTPUT FROM WRF V4.2.1 MODEL START_DATE: 2019-06-25_12:00:00 SIMULATION_START_DATE: 2019-06-25_12:00:00 WEST-EAST_GRID_DIMENSION: 169 SOUTH-NORTH_GRID_DIMENSION: 169 BOTTOM-TOP_GRID_DIMENSION: 41 ... ... ISICE: 22 ISURBAN: 19 ISOILWATER: 14 HYBRID_OPT: 2 ETAC: 0.2 pyproj_srs: +proj=lcc +lat_0=-36.9370002746582 +lon_... Which of the attributes are needed for constructing the grid? Maybe I could add them back in after the operation. Or open to alternate methods if you have and suggestions! Thanks :)
fmaussion commented 2 years ago

@sbr-5260 how did you create ds?

fmaussion commented 2 years ago

@sbr-5260 OK, this might be useful: https://salem.readthedocs.io/en/v0.2.3/xarray_acc.html#keeping-track-of-attributes

sbr-5260 commented 2 years ago

I am using the salem.open_mf_wrf_dataset to get the dataset, and the dataset "ds" plots fine until my function-- looks like its the lost attributes that are causing it.

Thanks so much! I just need to figure out how to save them or add them back in then.

fmaussion commented 2 years ago

pyproj_srs is the attribute you want to keep and/or copy after operations.

sbr-5260 commented 2 years ago

Got it working! Thanks :)

singledoggy commented 2 years ago

Do you have a plan to understand CF projection data?It seems weird to set the projection in code other than files for me. Or did it already supported but I missed it?

fmaussion commented 2 years ago

When salem was developed there was no system to interpret CF conventions, and even less files using them ;-). For example WRF is still not using them AFAIK.

salem is on "maintainance mode" since a few years now and I don't really think there is much reason to develop it further given all the great tools around (metpy, cartopy, etc). But if someone wants to spend some time on it I am all for it! I am still using salem regularly and I think others do.

weirdoyang721 commented 1 year ago

My problem is same as that raised by Irene-GM. Here's my data created by salem.open_xr_datasat and pr = ds['variable'].isel(time=0).

<xarray.DataArray 'precip' (height: 1, rlat: 240, rlon: 262)>
array([[[6.673193e-05, 1.738828e-04, ..., 7.269982e-05, 6.908291e-05],
        [2.103231e-04, 2.199079e-04, ..., 7.857729e-05, 7.541250e-05],
        ...,
        [2.441412e-05, 2.839271e-05, ..., 3.255216e-06, 3.436061e-06],
        [2.396200e-05, 2.269609e-05, ..., 3.255216e-06, 3.345638e-06]]],
      dtype=float32)
Coordinates:
    time     datetime64[ns] 1979-01-01
  * height   (height) float64 0.0
  * rlat     (rlat) float64 -30.0 -29.75 -29.5 -29.25 ... 29.0 29.25 29.5 29.75
  * rlon     (rlon) float64 -32.75 -32.5 -32.25 -32.0 ... 31.75 32.0 32.25 32.5
    lat      (rlat, rlon) float64 -46.75 -46.92 -47.09 ... -47.42 -47.24 -47.07
    lon      (rlat, rlon) float64 -126.9 -127.1 -127.3 ... 52.84 53.03 53.23
Attributes:
    standard_name:  precipitation_flux
    long_name:      Total Precipitative Flux
    units:          kg m-2 s-1
    cell_methods:   time: mean (comment: 24-hr averaged values)
    grid_mapping:   rotated_latitude_longitude
    pyproj_srs:     +proj=longlat +datum=WGS84 +no_defs

Context: for the first running, the error "dataset Grid not undersantand" came out, then I add "rlon" and "rlat" to the valid_names list in salem/utils.py. After that, the code works fine. However, I may have done something wrong so that salem cannot understand the data grid. pr.salem.roi(shape=shp) gives an array full of NaN, and pr.salem.quick_map gives a figure likes this image I can see the the outlines of Antarctica (which is the domain of the shapefile), but the extent of the map is created from rlat and rlon.

I also tried this, the result shows that the two are inconsistent.

salem_lon, salem_lat = pr.salem.grid.ll_coordinates
np.testing.assert_almost_equal(salem_lon,pr.lon, atol=1e-4)
np.testing.assert_almost_equal(salem_lat, pr.lat, atol=1e-4)

Any suggestions would be appreciated, thanks!

fmaussion commented 1 year ago

@weirdoyang721 your dataset is probably in rotated pole projection. You need to tell salem how to interpret this. One example of implementation for the MetUM model can be found here: https://github.com/fmaussion/salem/blob/master/salem/sio.py#L1042

weirdoyang721 commented 1 year ago

Hi @fmaussion , yes, the dataset is in rotated pole projection. Here's the information of the rotated map:

In   [1]:ds['rotated_latitude_longitude']
Out[1]: 
<xarray.DataArray 'rotated_latitude_longitude' ()>
array(-2147483647)
Attributes:
    grid_mapping_name:          rotated_latitude_longitude
    grid_north_pole_latitude:   -180.0
    grid_north_pole_longitude:  -170.0
    north_pole_grid_longitude:  0.0

I tried to add a function def open_racmo_dataset followed that for MetUM and alter the srs as:

    srs = ('+ellps=WGS84 +proj=ob_tran +o_proj=latlon '
           '+o_lat_p={o_lat_p} +o_lon_p={o_lon_p} +lon_0={lon_0}')

and the params:

    params = {
        'o_lon_p': pole_longitude,
        'o_lat_p': pole_latitude,
        'lon_0': central_rotated_longitude,
    }

comes out pyproj_srs = '+ellps=WGS84 +proj=ob_tran +o_proj=latlon +o_lat_p=-180.0 +o_lon_p=-170.0 +lon_0=0.0' Run code:

ds = salem.open_racmo_dataset(r"file.nc")
ds_test=ds['precip'].isel(time=0).squeeze()
ds_teat.salem.quick_map()

error dataset grid not understans still prompted. I'm a layman to GIS, I'm not sure the manually added srs is correct. Is the error caused by the undefined projection?

In addition, if runds_test=ds['precip'].isel(time=0).resample(time='1Y').sum(), the attribute pyproj_srs could not be traced. However, if open the data with salem.open_dataset, the attribute is not lost in calculations.

Hope for your help! Thanks :)

fmaussion commented 1 year ago

In addition, if runds_test=ds['precip'].isel(time=0).resample(time='1Y').sum(), the attribute pyproj_srs could not be traced. However, if open the data with salem.open_dataset, the attribute is not lost in calculations

Yes, this is a known issue in salem. You need to use keep_attrs=True in xarray operations, until we implement a new data model for the CRS: https://salem.readthedocs.io/en/stable/xarray_acc.html#keeping-track-of-attributes

Regarding your other issue I'm afraid I can't help much. Maybe plotting with cartopy would be an option?

singledoggy commented 6 months ago

It seems that the ds = xr.open_mfdataset(preprocess=fun) and salem.open_mf_wrf_dataset(preprocess=fun) argument also causes the loss of attributes. I didn't realize this until recently. I'm commenting here to prevent others from encountering the same issue and not knowing how to troubleshoot it.