Closed NickGeneva closed 9 months ago
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
/blossom-ci
I would be interested to understand some more of the motivation for this. In particular because it rolls back a conscious decision to avoid cfgrib and xarray across the data sources more broadly. See https://github.com/NVIDIA/earth2mip/pull/64.
Also, the main "lexicon" used in the code base was the ECMWF grib code table: https://codes.ecmwf.int/grib/param-db/. See https://github.com/NVIDIA/earth2mip/blob/a17fd31ae15b83a052c57c88eb30a153d2995415/earth2mip/initial_conditions/cds.py#L43. We didn't use this in the other data sources yet, but the numeric code is far less ambiguous than the short names. The conversion to/from our channel names is handle like this:
code = cds.parse_channel('z500')
assert code.id == 129
assert code.level == 500
assert str(code) == 'z500'
This played into my choice to use the low level grib api in the cds.DataSource since it makes it trivial to extract the raw parameter ids directly from the grib data. The behavior of cfgrib in mapping param ID to name was less predictable, which is why I opted for eccodes.
Another disadvantage is that the new lexicon approach doesn't support arbitrary levels. only ones in the defined "lexicon". Also, concerned the new CDS data source is much slower since it doesn't combine pressure levels. that was my main motivation for rewriting the cds.DataSource.
I do like the __call__
API which includes channel_names.
In summary, would like to see the following changes before replacing the existing initial conditions:
initial_conditions.cds.DataSource
. sorry, but a lot of these changes are things I speciflcally undid in https://github.com/nbren12/earth2mip/commit/4b33f64cba2c4edf5ed67fe1dea69acfff4c84e8.cds.parse_channel
and ECMWF parameter IDs instead of dictionaries of strings.Also, this uses numpy docstrings...I thought we decided to do google style.
Assuming xarray is important (maybe some asked for this), we could make a helper function or method like this:
def get_dataarray_from_data_source(datasource, time, channel_names) -> xarray.Dataset:
return xarray.DataArray(datasource(time, channel_names), dim=["channel", "lat", "lon"], coords={"lat": datasource.grid.lat, ...}
Earth-2 MIP Pull Request
Description
Data sources will pipe from time, channel -> xarray dataarray which will then converted to tenor, metadata for pipelines.
Big refactor of initial conditions / data sources.
What I wont do:
Closes: https://github.com/NVIDIA/earth2mip/issues/127 Closes: https://github.com/NVIDIA/earth2mip/issues/131
Checklist
Dependencies