Deltares / hydromt

HydroMT: Automated and reproducible model building and analysis
https://deltares.github.io/hydromt/
MIT License
68 stars 28 forks source link

Include data catalogs in other data catalogs #477

Open dalmijn opened 1 year ago

dalmijn commented 1 year ago

Kind of request

Adding new functionality

Enhancement Description

The ability to include data catalog's (yaml's) via e.g. an include statement in another data catalog (e.g. final.yml). The desired result then would be that the datasets from all the data catalog's are available in HydroMT.

Take e.g. a data catalog (data_catalog1.yml)

my_data:
  - meta: meta

And then include it in another data catalog (e.g. final.yml)

include:
  - data_catalog1.yml
  - data_catalog2.yml

era5_daily:
  - stuff: stuff
  - more stuff: more stuff

And then just read this catalog via DataCatalog('./path_to/final.yml').

This would mean though that include is no longer available as a variable for a dataset. How to include this is of course is up for debate. But I think this would be nice to have.

Use case

Where data catalog yaml's would become very large or where there would be a lot of seperate data catalog yaml's to be put in the data_libs list.

Additional Context

No response

hboisgon commented 11 months ago

I'm not sure if this type of functionality is really needed. In the hydromt build/update configuration under global you can already list the data catalogs you want to use as well instead of using the command line.

global:
  data_libs:
    - data_catalog1.yml
    - data_catalog2.yml
DirkEilander commented 10 months ago

I agree this has low priority. However it should also be rely straight forward to implement and I can image it can help to organize your data catalogs (if many). I think we should include it. Given the priority it won't be this year I think though. I've now added it to Q1 but we will discuss during the Q1 planning if that is actually feasible.

savente93 commented 10 months ago

noting this for the discussion when it becomes relevant: this will need a way to deal with conflicting information, espeically if aliases will still be there. i.e. if Cat A says X -> Y and Cat B says X-> Z what should the correct result be?

DirkEilander commented 10 months ago

noting this for the discussion when it becomes relevant: this will need a way to deal with conflicting information, especially if aliases will still be there. i.e. if Cat A says X -> Y and Cat B says X-> Z what should the correct result be?

I think we already handle this with a warning, whereby a source in the last cat overwrites the previous. This can already occur with the functionality shown by Hélène above.