This repository contains the code for the webinar demo shown in: Best practices for writing ETL and ELT pipelines.
Watch the webinar here for free!
This repository is configured to spin up 6 Docker containers when you run astro dev start
(See Install the Astro CLI).
The containers are:
To connect Airflow to both the Postgres database and MinIO, create a .env
file in the root directory of the project with the exact contents of the .env.example
file. Note that you need to restart the Airflow instance with astro dev restart
after creating the .env
file for the changes to take effect.
All the DAGs run without any further setup or tools needed!
This repository contains:
dag-factory_dags
: A folder containing the code necessary to generate 3 DAGs with the dag-factory
package.
config_file.yml
: Config file creating the 3 DAGs.generate_dags.py
: The code to generate DAGs from the config file.helper
: This folder contains two DAGs meant to help you explore and develop.
query_tables
: A DAG that queries the tables in the Postgres database to return the number of records for each table.drop_tables_postgres
: A DAG that drops all the tables in the Postgres database.modularized_task_groups
: This folder contains a DAG with a modularized task group, stored in `include/custom_task_group/etl_task_group.py.pattern_dags
: Contains several DAGs showing different ETL and ELT patterns. They all use the Open Meteo API as a source system and load data to Postgres. All supporting SQL code is stored in the include folder.
include/dag_factory
: Contains the SQL code for the 3 DAG factory tasks.include/sql
: Contains the SQL code for all other tasks.The SQL code is repetitive for demo purposes, meaning you can manipulate the code for just one DAG to explore the DAGs without affecting other DAGs. In a real-world scenario you would likely modularize the SQL code further and avoid repetition.
.env.example
file to a new file called .env
. If you want to use a custom XCom backend with MinIO uncomment the last 4 lines in the .env
file.astro dev start
to start the Airflow instance. The webserver with the Airflow UI will be available at localhost:8080
. Log in with the credentials admin:admin
.query_tables
DAG to check the number of records in the tables.If you'd like to directly interact with the Postgres database, you can use the following commands to connect to the database:
docker ps
This command will list all the running containers. Look for the container with the image postgres:15.4-alpine
. Copy the container ID (in the format 30cfd7660be9
) and run the following command:
docker exec -it <container_id> psql -U postgres
You are now in a psql
session connected to the Postgres database. You can list the tables with the command \dt
and query the tables with SELECT * FROM <table_name>;
.