IBM / HNCDI-Explain-GeoDN

HNCDI Explain course notebooks for GeoDN
MIT License
2 stars 1 forks source link

error loading datacube, time dimension mismatched time coordinate #5

Open erikduijm opened 6 months ago

erikduijm commented 6 months ago

While working on part 1 of the fundamentals, under explore more, I tried to load the ERA5 precipitation data to compare with the CEH precipitation data.

image

All I do is query the ERA5 collection, and save it as a datacube. Then save the datacube as a .nc file and then read it as in the practical tutorial.

`collection_id_era5 = "Global weather (ERA5)"

Define the start and end time for the data query

start = "2007-01-01T11:00:00Z" end = "2007-12-31T11:00:00Z"

Define the bounding box for the data query

west = -0.48 south = 53.709 east = -0.22 north = 53.812

Define the bands for the data query

bands = ["Total precipitation"]

data_cube = geodn_discovery.query( collection_id = collection_id_era5, bands = bands, temporal_extent = {"start": start, "end": end}, spatial_extent = {"west": west, "south": south, "east": east, "north": north}, )

filename = "total_precip.nc" geodn_discovery.save(data_cube, filename, force=True)`

This works but I can't then read the file

# the loading fails: path = "data/" + filename x_data_era5 = geodn_discovery.open_datacube(path) x_data_era5

Gives the error: ValueError: conflicting sizes for dimension 'time': length 17474 on the data but length 8737 on coordinate 'time'

katharinareusch commented 6 months ago

Hello @erikduijm . So the problem is that netcdf files requires all variables to have the same time index so it will not be possible to save two variables with different time extents in the same file.

What you can do instead is load both files into two separate geopandas dataframe and then do a merge later on, for example with these merge functions where you can merge on time and location:

https://geopandas.org/en/stable/docs/user_guide/mergingdata.html https://stackoverflow.com/questions/73442368/spatial-join-by-geometry-and-time-in-pythongeopandas

erikduijm commented 6 months ago

When you say multiple variables what do you mean? I am only collecting a single band, total precipitation

katharinareusch commented 6 months ago

A apologies, I thought you were loading two files. I think the problem is that your timeline is longer than the timeline in the file. So you're asking for all of 2007 data from first of Jan to 31st of Dec start = "2007-01-01T11:00:00Z" end = "2007-12-31T11:00:00Z"

But the data file doesn't have that many timestamps. Maybe try it for just July and see if that works.

Keep me posted :)

erikduijm commented 6 months ago

Problem persisted if I just did a single month