dagster-io / hooli-data-eng-pipelines

Example Dagster Cloud code for the Hooli Data Engineering organization.
72 stars 15 forks source link

Hooli, Inc. Data Engineering

This repository includes Dagster code developed by the fictional data engineering team at Hooli. Another realistic dagster code base is Dagster Lab's own data engineering repository which is publicly shared here: https://github.com/dagster-io/dagster-open-platform

Getting Started

You can clone and run this example locally:

git clone https://github.com/dagster-io/hooli-data-eng-pipelines
pip install uv
make dependencies
make deps
make manifest
dagster dev

Code Structure

The team at Hooli uses multiple projects to allow for different teams to take ownership of their own data products. You will find these projects in separate folders, eg hooli_data_eng and hooli-demo-assets. The team also uses Dagster to manage a large dbt project which is colocated in this repo in the dbt_project folder.

Each of these projects is deployed to Dagster+ resulting in a single pane of glass across teams, with RBAC enforcing who can launch runs of different assets. The deployment is managed by the .github/workflows file, and deploys to a Dasgter+ Hybrid Kubernetes setup. The dagster_cloud.yaml file is to configure the projects.

To see this in action checkout this video.

Dev Note: to run multiple projects locally, you will want to use the workspaces.yml file. By default, dagster dev will run the hooli_data_eng project which accounts for the majority of the examples in this repo.

Main Features

The majority of features are implemented in the hooli_data_eng project and include:

Specifically, the project showcases a hypothetical use case where raw data is ingested from an API, transformed through dbt, and then used by marketing and ML teams. A few assets are worth highlighting:

Dev notes on running the Sling example

Deployment Architecture

This repository uses Dagster Cloud Hybrid architecture with GitHub Actions to provide CI/CD.

Additional Dev Notes in the Repo Wiki