google-deepmind / graphcast

Apache License 2.0
4.51k stars 563 forks source link

Problem of building own dataset #90

Open chufall opened 1 month ago

chufall commented 1 month ago

Hi, I'm trying to build a xarray dataset with mine own data, and the code is as following:

`numpy_array = data

lons = np.linspace(-180, 180, 73)
lats = np.linspace(87.5, -87.5, 71)

# mydatetime=
# dims = ('batch', 'time', 'channel', 'longitude', 'latitude')
times = pd.to_timedelta([f'{i:2d}:00:00' for i in range(data.shape[1])])
datetimes = pd.date_range(start=f'{start_year}-01-01', end=f'{end_year}-12-31 23:00:00', freq='H')

coords = {
    'batch': range(data.shape[0]),
    'time': ('time',times),
    'channel':[0],
    'lon': ('lon', lons),
    'lat': ('lat',lats),
    'datetime': (('batch','time'), [datetimes])
}

data_vars = {'Temp': (('batch', 'time', 'channel', 'lat', 'lon'), numpy_array[:, :, 0:1, :, :])}
xr_dataset = xarray.Dataset(data_vars=data_vars, coords=coords)`

but in the result dataset , the coord "datetime" which type is xarray.DataArray has own coords and dims "batch" and "time" , such like:

<xarray.DataArray 'datetime' (batch: 1, time: 17544)> Size: 140kB
array([['2020-01-01T00:00:00.000000000', '2020-01-01T01:00:00.000000000',
        '2020-01-01T02:00:00.000000000', ...,
        '2021-12-31T21:00:00.000000000', '2021-12-31T22:00:00.000000000',
        '2021-12-31T23:00:00.000000000']], dtype='datetime64[ns]')
Coordinates:
  * batch     (batch) int64 8B 0
  * time      (time) timedelta64[ns] 140kB 00:00:00 ... 730 days 23:00:00
    datetime  (batch, time) datetime64[ns] 140kB 2020-01-01 ... 2021-12-31T23...

However if I load the example dataset "source-era5_date-2022-01-01_res-0.25_levels-37_steps-01.nc", whose coord "datetime" has "batch" and "time" dims ,but only the "time" coord. such like :

<xarray.DataArray 'datetime' (batch: 1, time: 3)> Size: 24B
array([['2022-01-01T00:00:00.000000000', '2022-01-01T06:00:00.000000000',
        '2022-01-01T12:00:00.000000000']], dtype='datetime64[ns]')
Coordinates:
  * time      (time) timedelta64[ns] 24B 00:00:00 06:00:00 12:00:00
    datetime  (batch, time) datetime64[ns] 24B 2022-01-01 ... 2022-01-01T12:0...
Dimensions without coordinates: batch

So my question is how to remove the coord "batch" of the datetime in my dataset?

Thanks a lot!

Sincerely, Qc

agbruno-git commented 1 month ago

Try to use .squeeze() on your dataset, it should remove the dependence.

chufall commented 1 month ago

Try to use .squeeze() on your dataset, it should remove the dependence.

Thank you for your reply!

I have tried by run: xr_dataset.coords["datetime"].squeeze() The result is no changed. There's the coord 'batch in the "datetime"

if I run the : xr_dataset.squeeze() , it will remove all the dims=1 , which is not my option

Qc