bcgov / nr-airflow-poc

Airflow Proof of Concept environment with sample data pipeline.
Apache License 2.0
3 stars 0 forks source link

Airflow Proof of Concept Environment

This repository holds all the code related to the Airflow Proof of Concept environment with a sample data pipeline and a log cleaner. It has all the structures and auxiliary functionalities to move data between two databases, working only with Oracle and Postgres.

It runs on Docker containers using the docker-compose plugin to orchestrate three containers, all used for Airflow:

The metadata database is running on a persistent volume so we won't lose data in case of stopping our container.

Running the application

Once you have cloned this repository, to start the Airflow environment we first set an environment variable to tell the application which environment we are working on (dev, test, or prod). That value will be used on the DAGs to get the correct source and target database connections according to the environment - if we are running Airflow on test env then we should move data between test databases and if we are on prod env we should use prod databases. There is also a variable for the default configuration folder that should not be changed:

ENV AIRFLOW_ENV=dev
ENV AIRFLOW_CONFIG_PAHT=/opt/airflow

There is also a folder mapping under the volume area of docker-compose file:

volumes:
  - ./config:/opt/airflow/config
  - ./dags:/opt/airflow/dags
  - ./database_metadata:/opt/airflow/database_metadata
  - ./logs:/opt/airflow/logs
  - ./plugins:/opt/airflow/plugins

We then need to build the image from the Dockerfile by running the airflow-init section from the docker-compose.yml file:

docker-compose up airflow-init

And when the build ends we can start our applications:

docker-compose up

To use Airflow web interface just go to http://localhost:8080/ on your browser and use "airflow" as username and password. You should see two paused dags without any previous executions.

Inside Airflow container

Once our environment is up and running we can get inside our containers if we need to run any Airflow commands. To log in to the container just type the following command replacing the "airflow_airflow-scheduler_1" for your container name:

docker exec -it airflow_airflow-scheduler_1 bash

When inside Airflow container we can run some commands to interact with the DAGs. We could restart our metadata database in order to capture changes in our DAGs:

airflow db init

We could list the DAGs:

airflow dags list

And we cloud even delete a DAG and all its metadata:

airflow dags delete YOUR_DAG_NAME

Some considerations