NASA-IMPACT / dashboard-api-starter

API for the Earthdata Dashboard
MIT License
1 stars 2 forks source link

Datasets returned from list datasets should _not_ be managed by dashboard-api #2

Open abarciauskas-bgse opened 3 years ago

abarciauskas-bgse commented 3 years ago

Problem: If we want the covid-api to be decoupled from it's application, like the covid-dashboard, we should off-load where static datasets are managed.

Datasets right now could potentially be sourced from:

The main reason to use github is that github can manage versioning of datasets, which is good because dataset changes could impact dashboard and other API / services functionality. Versioning of datasets is useful in the case that you need to test a change to a dataset before deploying it to "production"

Challenges with github are that it requires redeploying the API when changes are made to datasets and forking the API if datasets are managed in this repo.

At this time, it doesn't seem possible to offload everything to a metadata API because the datasets endpoint does more than just return a list of datasets. It has a specific schema both for the way datasets are listed (e.g. using "_all", "global" and specific site keys) and for the datasets themselves (including information about how to visualize or provide a time series).

One proposed solution is to use a separate github repo for the static datasets used in any specific instance of the API.

Workflow would be as such:

  1. Create a github repo to version your static datasets.
  2. Create an S3 location to house your static datasets
  3. Configure the covid API to read datasets from this S3 location (use caching so every call to /datasets doesn't require a call to S3).
  4. Whenever a dataset change is merged to the "main" production branch of your datasets repo, new dataset versions are pushed to S3.

@drewbo @olafveerman @leothomas WDYT ⬆️

leothomas commented 3 years ago

At first glance it seems like a great idea. It would also allow us to remove the lambda that calculates the date domain for each dataset from the covid-api itself, which would resolve some headaches with deploying dev versions of a dataset for validation purposes, before making the dataset fully available.