Refactored (mostly rewritten) the Terraform code for environment provisioning and improved the documentation. Under the terraform directory there is now:
The envs directory containing a basic Terraform configuration for each environment (dev/test/prod). This uses Terraform modules that are defined in...
The modules directory contains three Terraform modules:
cloudfunction - for deploying a (Pub/Sub-triggered Cloud Function)
vertex_deployment - infra deployment for a Vertex Pipelines environment, including buckets, service accounts, IAM, Vertex Metadata store, Pub/Sub topic and Cloud Function (using cloudfunction module above)
scheduled_pipelines - for deploying Cloud Scheduler jobs
The CI/CD pipelines and Makefile have also been tweaked to support this. The transfer_dataset.sh script has been removed, as this piece of setup is now covered in the README explicitly without the need for the script.
The date of the SQL queries and the TFDV schema also had to be changed for the E2E tests to pass - the chicago taxi dataset has changed!
Closes #13 as well
How has this been tested?
[x] Infrastructure deployed successfully and Cloud Scheduler/Cloud Function/Vertex Pipeline integration run successfully in dev environment.
[x] Infra deployed successfully for test/prod
[x] CI/CD pipelines tested
Passing automated tests incl e2e test - see PR here showing successful tests
Checklist
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have successfully run the E2E tests
- [ ] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes
[x] I have updated any relevant documentation to reflect my changes
Description
Refactored (mostly rewritten) the Terraform code for environment provisioning and improved the documentation. Under the
terraform
directory there is now:envs
directory containing a basic Terraform configuration for each environment (dev/test/prod). This uses Terraform modules that are defined in...modules
directory contains three Terraform modules:cloudfunction
- for deploying a (Pub/Sub-triggered Cloud Function)vertex_deployment
- infra deployment for a Vertex Pipelines environment, including buckets, service accounts, IAM, Vertex Metadata store, Pub/Sub topic and Cloud Function (usingcloudfunction
module above)scheduled_pipelines
- for deploying Cloud Scheduler jobsThe CI/CD pipelines and Makefile have also been tweaked to support this. The
transfer_dataset.sh
script has been removed, as this piece of setup is now covered in the README explicitly without the need for the script.The date of the SQL queries and the TFDV schema also had to be changed for the E2E tests to pass - the chicago taxi dataset has changed!
Closes #13 as well
How has this been tested?
Passing automated tests incl e2e test - see PR here showing successful tests
Checklist
- [ ] I have added tests that prove my fix is effective or that my feature works