Update `test_dbt_models` workflow to format and upload run results to S3

This PR updates the test_dbt_models workflow to format run results and upload them to S3 for analysis. In the process, it refactors the format_dbt_test_failures script into a more general form (now called transform_dbt_test_results) that outputs test result metadata to parquet files in addition to outputting the failures workbook.

Schema

Here's the schema of the new generated parquet files:

metadata/test_run/run_id=*/*.parquet: Metadata about the full test run, partitioned by run ID. Populates the qc.test_run table in Athena. Unique by run_id.
- run_id: primary key string representing the unique ID of the test run (technically the invocation ID of the dbt test command)
- run_date: string representing the UTC date that the test ran, in YYYY-MM-DD format
- elapsed_time: decimal representing the total number of seconds that the tests took to execute
- run_by: string representing the username of the person who ran the script. On a local machine, this will be the value of the Unix $USER variable set on login; on CI, this will be set to the github.triggering_actor environment variable.
- var_year_start: string representing the value of test_qc_year_start for the test run in YYYY-MM-DD format
- var_year_end: string representing the value of test_qc_year_end for the test run in YYYY-MM-DD format
- git_sha: string representing the hash of the script's commit
- git_branch: string representing the branch of the script's commit
- git_author: string representing the name and email address of the author of the script's commit
metadata/test_run_result/run_id=*/*.parquet: Metadata about individual test results (i.e. pass, fail, number of failing rows, etc.) in this run, partitioned by run ID. Populates the qc.test_run_result table in Athena. Unique by run_id, test_name, and township.
- run_id: composite primary key string representing the unique ID of the test run. Foreign key to qc.test_run.run_id.
- test_name: composite primary key string representing the name of the test, e.g. iasworld_sales_cur_in_accepted_values
- township_code: composite primary key string representing the township of the test results; can be null if the failing rows can't be mapped to townships, or if the test passed
- table_name: string representing the name of the iasWorld table that the test is defined on, e.g. pardat
- category: string representing the category of the test, e.g. class_mismatch_or_issue
- description: string representing the long description of the test, e.g. at least one class should match pardat class
- status: string enum representing the status of the test, one of pass or fail
- elapsed_time: decimal representing the number of seconds that the test took to execute
- num_failing_rows: integer representing the number of rows that failed the test, or 0 if the test passed

Testing

See here for a successful workflow run, and check the qc.test_run and qc.test_run_result tables in Athena to browse the metadata for that run. An example of the types of queries we can run using this schema:

SELECT run_id, SUM(num_failing_rows)
FROM qc.test_run_result
WHERE status = 'fail' AND category = 'class_mismatch_or_issue'
GROUP BY run_id

ccao-data / data-architecture

Update `test_dbt_models` workflow to format and upload run results to S3 #370

Schema

Testing