ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Refactor Python model dependencies to use global package repository #461

Closed jeancochrane closed 1 month ago

jeancochrane commented 1 month ago

In the course of discussion on #453, we decided to simplify the design of the Python model dependency deployment system:

Rather than a per-model, per-environment zip of packages, we could create a global directory of packages by version (probably in s3://ccao-athena-dependencies-us-east-1). [...] In this setup, people would specify the specific package version they want for a model in that model's schema file. We'd then have a CI job collect and dedupe those versions into a single list. We can then pip install --no-deps each of the packages, zip the results, and name the file by the package name and version. Finally, we'd aws s3 sync the CI package dir with the bucket.

This PR refactors deploy_dbt_model_dependencies.sh and the reporting.ratio_stats Python model to implement this new design.