NASA-IMPACT / veda-data

2 stars 0 forks source link

Add GEDI L4B dataset to the API #74

Closed abarciauskas-bgse closed 11 months ago

abarciauskas-bgse commented 2 years ago

For each dataset, we will follow the following steps:

Identify the dataset and what the processing needs are

  1. Identify dataset and where it will be accessed from.

We should use the CMR discovery task and can discover files to publish from: https://cmr.earthdata.nasa.gov/search/concepts/C2244602422-ORNL_CLOUD.html however we should not copy the files to our S3 bucket but add assets with links to the ORNL DAAC bucket, for example: s3://ornl-cumulus-prod-protected/gedi/GEDI_L4B_Gridded_Biomass/data/GEDI04_B_MW019MW138_02_002_05_R01000M_MU.tif

Design the metadata and publish to the Dev API

  1. Review conventions for generating STAC collection and item metadata:

    • Collections: https://github.com/NASA-IMPACT/delta-backend/issues/29 and STAC version 1.0 specification for collections
    • Items: https://github.com/NASA-IMPACT/delta-backend/issues/28 and STAC version 1.0 specification for items
    • NOTE: The delta-backend instructions are specific to datasets for the climate dashboard, however not all datasets are going to be a part of the visual layers for the dashboard so I believe you can ignore the instructions that are specific to "dashboard" extension, "item_assets" in the collection and "cog_default" asset type in the item
  2. After reviewing the STAC documentation for collections and items and reviewing existing scripts for generating collection metadata (generally with SQL) and item metadata, generate or reuse scripts for your collection and a few items to publish to the testing API. There is some documentation and examples for how to generate a pipeline or otherwise document your dataset workflow in https://github.com/NASA-IMPACT/cloud-optimized-data-pipelines. We would like to maintain the scripts folks are using to publish datasets in that repo so we can easily re-run those datasets ingest and publish workflows if necessary.

  3. If necessary, request access and credentials to the dev database and ingest and publish to the Dev API. Submit a PR with the manual or CDK scripts used to run the workflow to publish to the Dev API and include links to the published datasets in the Dev API

Publish to the Staging API

Once the PR is approved, we can merge and publish those datasets to the Staging API

j08lue commented 11 months ago

Stale