This repository contains hands-on-lab modules that cover provisioning the foundational infrastructure and security in GCP for Cloud Composer based Data Analytics projects. It does not cover authoring complex Apache Airflow DAGs and Airflow functionality as the focus is creating a stable and secure environment for authoring pipelines.
The security features covered include-
The setup is verified with -
The security setup is by no means 100% comprehensive (not air gap), but is a quickstart, step by step, instructional guide.
Demystify security Cloud Composer based pipelines on GCP, simplify the journey of a Data Analytics Architect/Engineer persona on GCP, by educating on the intricacies of foundational secure setup, unblocking and improving speed to productivity in their core competency (analytics). For the GCP Customer Engineer, the hands on labs cover provisioning in Argolis.
The repository contains modules that are deliberately detailed with sequential steps (versus fully scripted, automated) to provide an understanding of what is involved. The hands on lab modules will be complemented with Terraform scripts for automation.
This module is available in a separate repo. This setup is recommended for kicking tires, simple demos, but is not a secure setup. It is recommended that you run through this module if you are new to Cloud Composer 2 and its provisioning, new to authoring, deploying DAGs, and triggering DAG execution in a event driven fashion.
If you need to start with a secure Cloud Composer 2 setup, jump to section 2, below. The same DAGs in section 2, are used across public/private Cloud Composer provisioning modules.
The DAGs are deliberately embarassigly basic to maintain focus on environment provisioning->securing->testing.
# | Sub-Modules |
---|---|
1 | Git repo cloning |
2 | Hello World DAG |
3 | GCS Event Driven Orchestration of the Hello World DAG |
4 | Pub/Sub Event Driven Orchestration of the Hello World DAG |
5 | Minimum viable (ETL) DAG with GCS, Cloud Dataflow and BigQuery |
# | Sub-module |
---|---|
1 | Create a (service) project for Data Analytics |
2 | Enable requisite Google APIs |
3 | Update organizational policies |
4 | Create a User Managed Service Account for Data Analytics |
5 | Grant general IAM permissions |
6 | Cloud Composer specific IAM permissions |
7 | Cloud Functions specific IAM permissions |
8 | Cloud Dataflow specific IAM permissions |
9 | Cloud Storage specific IAM permissions |
10 | BigQuery specific IAM permissions |
11 | Permissions specific to Cloud Composer 2 infrastructure |
This module covers the below security features/components/layers.
Jump to lab module
# | Sub-module |
---|---|
1 | Create a (host) project for the shared VPC |
2 | Enable requisite Google APIs |
3 | Apply organizational policies in the host project |
4 | Grant operator/admin permissions in the host project |
5 | Enable shared VPC in the host project |
6 | Associate the service (data analytics) project with the "Shared VPC" project |
7 | Create a VPC in the host project |
8 | Create subnets for secure Cloud Composer 2 & for a shared VPC Access Connector |
9 | Create firewall rules |
10 | Configure DNS for *.pkg.dev |
11 | Grant IAM permissions in host project for service project's service accounts |
12 | Create the Serverless VPC Access Connector |
13 | Configure networking to allow downloads of external packages |
14 | Deploy and test a "Hello World" DAG |
15 | Deploy and test a Cloud Function to call the "Hello World" DAG when triggered by a Cloud Storage event |
16 | Deploy and test a Cloud Function to call the "Hello World" DAG when triggered by a Cloud Pub/Sub event |
17 | Deploy and test a minimum viable ETL DAG (GCS->Cloud Dataflow for ETL->BigQuery |
This module adds incremental security with VPC Service Controls to the setup from module 2.
Jump to lab module
# | Sub-module |
---|---|
1 | Enable requisite Google APIs |
2 | Grant IAM permissions to operate with Access Context Manager in host project |
3 | Create Access Context Manager policy in host project |
4 | Create access levels in the host project for the VPC perimeter |
5 | Create DNS entries in the host project for googleapis.com, gcr.io and composer.cloud.google.com |
6 | Configure incremental firewall rules in the host project |
7 | Create the VPC perimeter |
8 | Retest Cloud Composer DAGs |
This module adds incrementally, the Private Service Connect (PSC) security feature to the setup from module 3.
Jump to lab module
# | Sub-module |
---|---|
1 | Create a Cloud Composer cluster with PSC configured at provision time |
2 | Retest Cloud Composer DAGs |
3 | Delete resoruces created for the lab |
This is a community effort. Contributions are welcome.
# | Contributor | Contribution | About |
---|---|---|---|
1 | Anagha Khanolkar | Author | Data Analytics Specialist Engineer, NATT, Google Cloud |
2 | Jay O' Leary | Tester | Data Analytics Specialist Engineer, SRTT, Google Cloud |
Special thanks to Joseph Zhou for Argolis related consults, Eddie Villalba for GKE related, Christopher Abraham for Networking, Arun Santhanagopalan and Jason Bisson for security in general.