ecmwf / earthkit-data

A format-agnostic Python interface for geospatial data
Apache License 2.0
56 stars 13 forks source link

Implement gridspec #173

Open sandorkertesz opened 1 year ago

sandorkertesz commented 1 year ago

The main purpose of gridspec is to describe the grid structure of a field. Requirements:

Examples:

{"grid": "O1280"}
{"grid": [1, 1], "area": [60,-20,40,30]}

Interpolation

The idea is that in the yet-to-be-developed earthkit.regrid module the gridspec would specify both the input and output grids and we would able to perform an interpolation purely on a numpy array. E.g.:

import earthkit.data
import earthkit.regrid

ds = earthkit.data.from_source("file", "data_2x2.grib")

# interpolate the first field onto a 5x5 degree regular_ll grid
v = ds[0].values
gs = ds[0].metadata().gridspec
v_new, gs_new = earthkit.regrid.interpolate(v, gs, grid=[5, 5])

From the results we should be able to create a new Fieldlist like this:

ds_new = earthkit.data.FieldList.from_numpy(v_new, ds[0].metadata(), gs_new)

Now here ds[0].metadata() represents the the original "2x2" field while gs_new is the gridspec of the new "5x5" field. For users of ds_new all metadata queries should give results consistent with the new grid:

ds_new[0].metadata("numberOfDataPoints") # -> 2664
ds_new[0].metadata("latitudeOfFirstGridPointInDegrees") # -> 90
ds_new[0].shape # -> (37, 72)

The simplest solution to achieve this is to internally update the metadata object with the gridspec and create a new one. This would require the cloning of a new grib handle (since metadata stores a grib handle internally) from the gridspec. So here the task is to turn the gridspec (in the example it is as simple as {"grid: [5,5]}) into the relevant set of grib metadata keys and create the new handle with them. This is already performed in MIR, which creates the new grib handle from the generated metadata keys with the following call:

 auto* result =
    codes_grib_util_set_spec(h, &info.grid, &info.packing, flags, values.data(), values.size(), &err);

The question is if we want to duplicate the MIR code in Python or want to delegate this task straight to MIR (i.e. to earthkit.regrid). Please note that codes_grib_util_set_spec is not available in the ecCodes Python interface.

Work so far

The very first implementation in branch feature/gridspec works like this:

>>> import earthkit.data
>>> ds = earthkit.data.from_source("file", "tests/data/gridspec/t_75_-60_10_40_5x5.grib1")
>>> ds[0].metadata().gridspec
GridSpec({'type': 'regular_ll', 
   'grid': [0.083, 0.083], 
   'area': [41.188, 22.0, 34.521, 30.0], 
   'j_points_consecutive': 0, 
   'i_scans_negatively': 0, 
   'j_scans_positively': 1})
>>>

Generating a new GRIB handle from gridspec is implemented for regular_ll grids. An experimental "interpolation" notebook example showcasing this can be found here: https://github.com/ecmwf/earthkit-data/blob/feature/gridspec/docs/experimental/interpolate.ipynb

Excerpt from the notebook (here interpolate is a method written in scipy just to make the testing work):

# perform interpolation onto another grid using values array and gridspec
vals = ds[0].values
gs = ds[0].metadata().gridspec
vals_new, gs_new = interpolate(vals, gs, grid=[2,2], area=[70.0, -50.0, 20.0, 10.0])

# generate new meatadata from the resulting gridspec
md_new = ds[0].metadata().override(gridspec=gs_new)

# generate new fieldlist from the new values array and metadata
ds_new = FieldList.from_numpy(vals_new.flatten(), md_new)

TODO:

TESTING: Gridspec generation can easily be tested with a dictionary containing the relevant grib metadata values for a given field:

from earthkit.data.readers.grib.gridspec import make_gridspec

md = {'gridType': 'regular_ll', 'Ni': 97, 'Nj': 81,
   'iDirectionIncrementInDegrees': 0.083, 
   'jDirectionIncrementInDegrees': 0.083, 
   'latitudeOfFirstGridPointInDegrees': 34.521,
   'latitudeOfLastGridPointInDegrees': 41.188, 
   'longitudeOfFirstGridPointInDegrees': 22.0, 
   'longitudeOfLastGridPointInDegrees': 30.0, 
   'iScansNegatively': 0, 
   'jScansPositively': 1, 
   'jPointsAreConsecutive': 0}

gs = make_gridspec(md)
print(gs)

output:

GridSpec({'type': 'regular_ll', 'grid': [0.083, 0.083], 'area': [41.188, 22.0, 34.521, 30.0], '
 j_points_consecutive': 0, 'i_scans_negatively': 0, 'j_scans_positively': 1})
sandorkertesz commented 1 year ago

See notebook example: https://github.com/ecmwf/earthkit-data/blob/feature/gridspec/docs/experimental/interpolate.ipynb

sandorkertesz commented 1 year ago

Added JSON schema. All gridspecs (input+output) are validated against it usingjsonschema, which is now a dependency. https://github.com/ecmwf/earthkit-data/blob/feature/gridspec/earthkit/data/data/gridspec_schema.json