NASA-IMPACT / veda-data-airflow

Airflow implementation of ingest pipeline for VEDA STAC data
Other
10 stars 4 forks source link

Support collection multiple config #189

Closed slesaad closed 4 months ago

slesaad commented 4 months ago

Why?

For some collections in GHG with data hosted in DAACs, the data files come from different prefixes. See [this issue].(https://github.com/US-GHG-Center/ghgc-architecture/issues/292#issue-2396111574)

For scheduling collections like these, we should support having multiple configs in the same file while supporting the existing pattern of only having a single config in a file.

How?

When reading scheduled discoveries,

  1. allow the file to have either one config or a list of configs
  2. use the filename to name the scheduled discovery dag

Tested?

Deployed to ghgc-smce-dev environment via veda-deploy with the following file:

lpjeosim-wetlandch4-monthgrid-v2.json ```json [ { "collection": "lpjeosim-wetlandch4-monthgrid-v2", "bucket": "lp-prod-protected", "prefix": "LPJ_EOSIM_L2_MCH4E.001/", "filename_regex": ".*LPJ_EOSIM_L2_MCH4E_.*.tif$", "assets": { "ensemble-mean-ch4-wetlands-emissions": { "title": "(Monthly) Wetland Methane Emissions, Ensemble Mean LPJ-EOSIM Model v2", "description": "Methane emissions from wetlands in units of grams of methane per meter squared per month. Ensemble of multiple climate forcing data sources input to LPJ-EOSIM model.", "regex": ".*LPJ_EOSIM_L2_MCH4E_ensemble_mean_001.*.tif$" }, "era5-ch4-wetlands-emissions": { "title": "(Monthly) Wetland Methane Emissions, ERA5 LPJ-EOSIM Model v2", "description": "Methane emissions from wetlands in units of grams of methane per meter squared per month. ECMWF Re-Analysis (ERA5) as input to LPJ-EOSIM model.", "regex": ".*LPJ_EOSIM_L2_MCH4E_ERA5_001.*.tif$" }, "merra2-ch4-wetlands-emissions": { "title": "(Monthly) Wetland Methane Emissions, MERRA-2 LPJ-EOSIM Model v2", "description": "Methane emissions from wetlands in units of grams of methane per meter squared per month. Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA-2) data as input to LPJ-EOSIM model.", "regex": ".*LPJ_EOSIM_L2_MCH4E_MERRA2_001.*.tif$" } }, "id_regex": ".*_(.*).tif$", "id_template": "lpjeosim-wetlandch4-monthgrid-v2-{}", "datetime_range": "month", "schedule": "18 17 * * *" }, { "collection": "lpjeosim-wetlandch4-monthgrid-v2", "bucket": "lp-prod-protected", "prefix": "LPJ_EOSIM_L2_MCH4E_LL.001/", "filename_regex": ".*LPJ_EOSIM_L2_MCH4E_LL.*.tif$", "assets": { "ensemble-mean-ch4-wetlands-emissions": { "title": "(Monthly) Wetland Methane Emissions, Ensemble Mean LPJ-EOSIM Model v2", "description": "Methane emissions from wetlands in units of grams of methane per meter squared per month. Ensemble of multiple climate forcing data sources input to LPJ-EOSIM model.", "regex": ".*LPJ_EOSIM_L2_MCH4E_LL_ensemble_mean_001.*.tif$" }, "era5-ch4-wetlands-emissions": { "title": "(Monthly) Wetland Methane Emissions, ERA5 LPJ-EOSIM Model v2", "description": "Methane emissions from wetlands in units of grams of methane per meter squared per month. ECMWF Re-Analysis (ERA5) as input to LPJ-EOSIM model.", "regex": ".*LPJ_EOSIM_L2_MCH4E_LL_ERA5_001.*.tif$" }, "merra2-ch4-wetlands-emissions": { "title": "(Monthly) Wetland Methane Emissions, MERRA-2 LPJ-EOSIM Model v2", "description": "Methane emissions from wetlands in units of grams of methane per meter squared per month. Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA-2) data as input to LPJ-EOSIM model.", "regex": ".*LPJ_EOSIM_L2_MCH4E_LL_MERRA2_001.*.tif$" } }, "id_regex": ".*_(.*).tif$", "id_template": "lpjeosim-wetlandch4-monthgrid-v2-{}", "datetime_range": "month", "schedule": "18 17 * * *" } ] ```

Created two scheduled discovery dags: image

and they worked as expected.

Existing scheduled dags with only one config also worked as expected.