Metadata interface - Githubissues

ecmwf / earthkit-data

A format-agnostic Python interface for geospatial data

Apache License 2.0

56 stars 15 forks source link

Metadata interface #267

Open corentincarton opened 11 months ago

corentincarton commented 11 months ago

Is your feature request related to a problem? Please describe.

When I extract specific values from the earthkit metadata, I get a list of tuples:

>>> source.metadata("param", "units")
[('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K'), ('2t', 'K')]

Describe the solution you'd like

Shouldn't it be more natural to return a dictionary with "param" and "units" as keys and a bunch of lists as values?

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

sandorkertesz commented 11 months ago

Hi @corentincarton, I am not sure if it would be more natural in this case, it is highly subjective. But it certainly could be a useful alternative output format. Maybe we could trigger it by using the as_dict kwarg?

source.metadata("param", "units", as_dict=True)

However, metadata() already has an astype kwargs to control the data type for the keys, so maybe as_dict is not the right name. Please see the documentation here: https://earthkit-data.readthedocs.io/en/latest/_api/data/readers/grib/codes/index.html#data.readers.grib.codes.GribField.metadata

@tlmquintino What do you think?

sandorkertesz commented 3 weeks ago

Another aspect to consider:

On a fieldlist we can get back the dictionary metadata output in two ways:

as list of dictionaries (one per field):

[{"param": "t", "units": "K"},
 {"param": "2t", "units": "K"},
...]

or as a dictionary containing lists (one list per key):

{"param": ["t", "2t",....], 
  "units": ["K", "K",...]}

Of course this is also true for the current list based output. So we need to find an extra keyword to control this behaviour too.

iainrussell commented 3 weeks ago

If it's any help, Metview has a similar notion in its grib_get function. It uses the kwarg grouping: grouping: “field” grouping: “key” See the link for an example usage.