Open jbanomedina opened 4 months ago
Sorry about the delay. We have done a lot of work on NetCDF. Can you try again with the latest version? Also, if you plan to use data from the CDS, I suggest that you download them in grib
, so you avoid some unnecessary conversion, and it will be faster.
Thank you very much for working on this, and for developing this amazing tool. I tried again with the last version, and the previous problem was solved. I am now getting the error below using ERA5, but does not seem critical since the .zarr file obtained seems to be fine, and I am able to open it with Python using the anemoi-datasets library. Could this error probably come from the fact that ERA5 is not a forecast and therefore it does not contain the attribute "forecast_reference_time"?
anemoi-datasets create recipe-era5-test.yaml ${workdir}/data/era5/era5_${yearInit}-01-01.zarr
2024-10-14 13:56:02 INFO Task init((),{}) starting
2024-10-14 13:56:08 INFO Setting flatten_grid=True in config
2024-10-14 13:56:08 INFO Setting ensemble_dimension=2 in config
2024-10-14 13:56:08 INFO Setting flatten_grid=True in config
2024-10-14 13:56:08 INFO Setting ensemble_dimension=2 in config
2024-10-14 13:56:08 INFO {'start': datetime.datetime(2013, 1, 1, 0, 0), 'end': datetime.datetime(2013, 1, 1, 18, 0), 'frequency': '6h', 'group_by': 'monthly'}
2024-10-14 13:56:08 INFO Groups(dates=1)
2024-10-14 13:56:08 INFO FunctionAction: path=./era5_2013-01-01.nc param=['10u']
2024-10-14 13:56:11 INFO Minimal input for 'init' step (using only the first date) :
2024-10-14 13:56:11 INFO netcdf(['2013-01-01T00:00:00'])
2024-10-14 13:56:11 INFO Config loaded ok:
2024-10-14 13:56:11 INFO Found 4 datetimes.
2024-10-14 13:56:11 INFO Dates: Found 4 datetimes, in 1 groups:
2024-10-14 13:56:11 INFO Missing dates: 0
2024-10-14 13:57:22 INFO Found 1 variables : 10u.
2024-10-14 13:57:22 INFO Found 1 ensembles : 0.
2024-10-14 13:57:22 INFO gridpoints size: [1038240, 1038240]
2024-10-14 13:57:22 INFO resolution=None
2024-10-14 13:57:22 INFO total_shape = [4, 1, 1, 1038240]
2024-10-14 13:57:22 INFO chunks=(1, 1, 1, 1038240)
2024-10-14 13:57:22 INFO Creating Dataset './era5_2013-01-01.zarr', with total_shape=[4, 1, 1, 1038240], chunks=(1, 1, 1, 1038240) and dtype='float32'
2024-10-14 13:57:22 ERROR Error in retrieving metadata (cannot build data request info) for XArrayMetadata({'variable': '10u', 'time': '0000', 'date': '20130101', 'step': 0, 'valid_datetime': '2013-01-01T00:00:00'})
Traceback (most recent call last):
File "./envs/nwm-anemoi/lib/python3.12/site-packages/anemoi/datasets/create/input.py", line 111, in _data_request
date = field.datetime()["valid_time"]
^^^^^^^^^^^^^^^^
File "./envs/nwm-anemoi/lib/python3.12/site-packages/earthkit/data/core/fieldlist.py", line 512, in datetime
return self._metadata.datetime()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "./envs/nwm-anemoi/lib/python3.12/site-packages/earthkit/data/core/metadata.py", line 312, in datetime
"base_time": self._base_datetime(),
^^^^^^^^^^^^^^^^^^^^^
File "./envs/nwm-anemoi/lib/python3.12/site-packages/anemoi/datasets/create/functions/sources/xarray/metadata.py", line 84, in _base_datetime
return self._field.forecast_reference_time
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "./envs/nwm-anemoi/lib/python3.12/site-packages/anemoi/datasets/create/functions/sources/xarray/field.py", line 106, in forecast_reference_time
return self.owner.forecast_reference_time
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'Variable' object has no attribute 'forecast_reference_time'
2024-10-14 13:57:22 WARNING Dataset name error: the dataset name 'era5_2013-01-01' does not follow naming convention. Does not match ^(\w+)-([\w-]+)-(\w+)-(\w+)-(\d\d\d\d)-(\d\d\d\d)-(\d+h)-v(\d+)-?([a-zA-Z0-9-]+)?$
2024-10-14 13:57:24 INFO Number of years 0 < 10, leaving out 20%. end=np.datetime64('2013-01-01T12:00:00')
2024-10-14 13:57:24 INFO Will compute statistics from 2013-01-01T00:00:00 to 2013-01-01T12:00:00
2024-10-14 13:57:24 INFO Task load((),{}) starting
2024-10-14 13:57:24 INFO {'end': '2013-01-01T18:00:00', 'frequency': '6h', 'group_by': 'monthly', 'start': '2013-01-01T00:00:00'}
2024-10-14 13:57:24 INFO Groups(dates=1)
2024-10-14 13:57:24 INFO FunctionAction: param=['10u'] path=./era5_2013-01-01.nc
Loading 3/4: 100%|████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 5.68it/s]
2024-10-14 13:57:28 INFO Name : /data
Type : zarr.core.Array
Data type : float32
Shape : (4, 1, 1, 1038240)
Chunk shape : (1, 1, 1, 1038240)
Order : C
Read-only : True
Compressor : Blosc(cname='lz4', clevel=5, shuffle=SHUFFLE, blocksize=0)
Store type : zarr.storage.DirectoryStore
No. bytes : 16611840 (15.8M)
No. bytes stored : 13288828 (12.7M)
Storage ratio : 1.3
Chunks initialized : 4/4
2024-10-14 13:57:28 INFO Task finalise((),{}) starting
2024-10-14 13:57:28 INFO Variables minimum maximum mean stdev has_nans
10u -21.56 22.45 -0.37 5.77 0.00
2024-10-14 13:57:28 INFO Wrote statistics in ./era5_2013-01-01.zarr
Computing size of ./era5_2013-01-01.zarr: 16it [00:00, 4772.02it/s]
2024-10-14 13:57:28 INFO Total size: 12.7 MiB
2024-10-14 13:57:28 INFO Total number of files: 62
2024-10-14 13:57:28 INFO Task patch((),{}) starting
2024-10-14 13:57:28 INFO ✅ Remove _create_yaml_config
2024-10-14 13:57:28 INFO Dataset changed by patch
2024-10-14 13:57:28 INFO Task init_additions((),{}) starting
2024-10-14 13:57:28 WARNING No delta found in kwargs, no addtions will be computed.
2024-10-14 13:57:28 INFO Task run_additions((),{}) starting
2024-10-14 13:57:28 WARNING No delta found in kwargs, no addtions will be computed.
2024-10-14 13:57:28 INFO Task finalise_additions((),{}) starting
2024-10-14 13:57:28 WARNING No delta found in kwargs, no addtions will be computed.
Computing size of ./era5_2013-01-01.zarr: 16it [00:00, 10111.32it/s]
2024-10-14 13:57:28 INFO Total size: 12.7 MiB
2024-10-14 13:57:28 INFO Total number of files: 62
2024-10-14 13:57:28 INFO Task cleanup((),{}) starting
2024-10-14 13:57:28 INFO Task verify((),{}) starting
2024-10-14 13:57:28 INFO Verifying dataset at ./era5_2013-01-01.zarr
2024-10-14 13:57:28 INFO ./era5_2013-01-01.zarr
2024-10-14 13:57:28 INFO Create completed in 1 minute 25 seconds
What happened?
My goal is to build a dataset from NetCDF files using the anemoi-datasets library. However, I get an error when using NetCDF files as the source. I have tried both version 0.4.0 (installed using
pip
) and the develop branch (installed by cloning the repository). I was able to successfully build a dataset from a grib file, however for my project I have the data on the NetCDF format.What are the steps to reproduce the bug?
Code needed to reproduce this error is the following. 1) First, I download a sample NetCDF file from the CDS using a python script.
2) Second, I point to this sample in the
recipe.yaml
file.3) Type this in the command line:
Version
v0.4.0
Platform (OS and architecture)
Linux exp-18-17 4.18.0-513.24.1.el8_9.x86_64 #1 SMP Thu Apr 4 18:13:02 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Relevant log output
Accompanying data
No response
Organisation
No response