Open DirkEilander opened 1 month ago
Suggested yaml:
my_nice_source:
data_type: DataFrame # not editable by variants
...
variant_keys:
- metadata.provider
- metadata.crs
- driver.filesystem
variants:
- uri: s3://bucket/key1/key2.json # required for all variants.
metadata:
crs: 4326
provider: organisation1
driver:
filesystem: s3
- uri: /mnt/p/cooldata.json
metadata:
crs: 90002
provider: organization2
driver:
filesystem: local
default_variant: True
Where variant_keys
are keys that uniquely define the variant, which should be present in each variant definition. Other fields like uri
can overwrite the source definition. dots in variant_keys
define nested fields. If no variant is requested a the default variant is used, which is flagged by the default_variant
key. All variants should be of the same datatype, hence this field cannot be overwritten, but all other fields can be overwritten.
Also discussed: DataCatalog._sources should become a dictionary of lists with all variants (instead of a nested dict currently) where we find the requested variant based on filtering. To request a specific variant a dictionary with source name and variant keys and associated values is given to the data_like
argument in DataCatalog.get_rasterdataset
(and similar) methods, see below. If now unique variant is found an error is raised.
da = data_catalog.get_rasterdataset(
data_like = {"source": "my_nice_source", "metadata.crs": 4326},
...
)
In addition to the yaml format above which specifies variant_keys
that are already existing keys of the the data source, it should also be possible to define new keys. This can already be added to metadata in the current setup, but we could also create a specific variant
field in DataSource. I suggest that keys specified in the variant
field don't need a section prefix to keep requesting data as above short.
my_nice_source:
variant_keys:
- name
variants:
- uri: s3://bucket/key1/key2.json
variant:
name: key2
- uri: /mnt/p/cooldata.json
variant:
name: cooldata
@hboisgon We would like to also get your feedback on this issue. With this new variant concept I think we have a single (before we had variant, alias and placeholder), but flexible way to define multiple variants of the same source. For the cmip6 model archive it would require a longer catalog yaml file, but with more flexibility to accommodate small differences between files in terms of format.
Kind of request
Changing existing functionality
Enhancement Description
Use case
We should discuss if we want to merge these concept to have a simpler interface for users
Additional Context
No response