ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Refactor `reporting.ratio_stats` to use new pattern for Python model dependencies #439

Closed jeancochrane closed 1 month ago

jeancochrane commented 1 month ago

Once https://github.com/ccao-data/data-architecture/issues/417 lands, we'll have a new system for deploying dependencies for Python models that make use of Athena PySpark; and once https://github.com/ccao-data/data-architecture/issues/438 lands with dbt 1.8 support, we'll be able to make use of the new config.get() method to pass dbt config variables into Python models. Once both of these pieces are in place, we'll be ready to update reporting.ratio_stats to use the new auto-deployed dependency bundle instead of pointing to a hardcoded location as it currently does:

https://github.com/ccao-data/data-architecture/blob/9b70277cb4b91c20d5720ead08fc7989d2924a61/dbt/models/reporting/reporting.ratio_stats.py#L3-L5

As part of this PR, let's make sure we have some good internal docs for adding and maintaining Python models, including an explanation of how the dependency deployment system works. Also, make sure to add a note on how to debug Python models, particularly how to find logs.