ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Add `final_model` CSV seed #365

Closed dfsnow closed 2 months ago

dfsnow commented 3 months ago

This PR moves the final_model table into dbt as a seed. It also updates the table (and downstream dependencies) such that an individual year can have multiple final models. Finally, it adds the 2024 final model records in final_model.

I played around with using [ephemeral](https://docs.getdbt.com/docs/build/materializations#ephemeral models to accomplish this, but they ended up not being needed.

Testing

Here's proof that the updated table/join results in no changes.

vw_ratio_stats

SELECT old.year, old.cnt_old, new.cnt_new
FROM (
    SELECT year, COUNT(*) AS cnt_old
    FROM "reporting"."vw_ratio_stats"
    GROUP BY year
) old
LEFT JOIN (
    SELECT year, COUNT(*) AS cnt_new
    FROM "z_ci_dfsnow-add-final-model-seed_reporting"."vw_ratio_stats"
    GROUP BY year
) new
ON old.year = new.year
ORDER BY year
year cnt_old cnt_new
2021 205368 205368
2022 300219 300219
2023 231949 231949
2024 3448 57008

And EXCEPT shows no value differences:

SELECT COUNT(*)
FROM (
    SELECT *
    FROM "reporting"."vw_ratio_stats"
    WHERE year <= '2023'
    EXCEPT
    SELECT *
    FROM "z_ci_dfsnow-add-final-model-seed_reporting"."vw_ratio_stats"
    WHERE year <= '2023'
)
count
0

res_report_summary

SELECT old.year, old.cnt_old, new.cnt_new
FROM (
    SELECT year, COUNT(*) AS cnt_old
    FROM "reporting"."res_report_summary"
    GROUP BY year
) old
LEFT JOIN (
    SELECT year, COUNT(*) AS cnt_new
    FROM "z_ci_dfsnow-add-final-model-seed_reporting"."res_report_summary"
    GROUP BY year
) new
ON old.year = new.year
ORDER BY year
year cnt_old cnt_new
2021 8850 8850
2022 13003 13003
2023 10128 10128
2024 268 3220