NASA-IMPACT / veda-data-airflow

Airflow implementation of ingest pipeline for VEDA STAC data
Other
7 stars 2 forks source link

veda-data-airflow

This repo houses function code and deployment code for producing cloud-optimized data products and STAC metadata for interfaces such as https://github.com/NASA-IMPACT/delta-ui.

Project layout

Fetching Submodules

First time setting up the repo: git submodule update --init --recursive

Afterwards: git submodule update --recursive --remote

Requirements

Docker

See get-docker

Terraform

See terraform-getting-started

AWS CLI

See getting-started-install

Deployment

This project uses Terraform modules to deploy Apache Airflow and related AWS resources using Amazon's managed Airflow provider.

Make sure that environment variables are set

.env.example` contains the environment variables which are necessary to deploy. Copy this file and update its contents with actual values. The deploy script will source and use this file during deployment when provided through the command line:

# Copy .env.example to a new file
$cp .env.example .env
# Fill values for the environments variables

# Init terraform modules
$bash ./scripts/deploy.sh .env <<< init

# Deploy
$bash ./scripts/deploy.sh .env <<< deploy

Note: Be careful not to check in .env (or whatever you called your env file) when committing work.

Currently, the client id and domain of an existing Cognito user pool programmatic client must be supplied in configuration as VEDA_CLIENT_ID and VEDA_COGNITO_DOMAIN (the veda-auth project can be used to deploy a Cognito user pool and client). To dispense auth tokens via the workflows API swagger docs, an administrator must add the ingest API lambda URL to the allowed callbacks of the Cognito client.

Gitflow Model

VEDA pipeline gitflow

License

This project is licensed under Apache 2, see the LICENSE file for more details.