ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Allow configuring a custom `year_field` for open data asset tests #279

Closed jeancochrane closed 6 months ago

jeancochrane commented 6 months ago

Open data asset tests are currently failing because we changed the name of the year field in the Assessed Values asset from year to tax_year:

requests.exceptions.HTTPError: 400 Client Error:
Bad Request for url: https://datacatalog.cookcountyil.gov/resource/uzyt-m557.json?%24query=SELECT+COUNT%28%2A%29%2Cyear+GROUP+BY+year

Here's the API response that's causing this 400 error:

{"message":"Query coordinator error: query.soql.no-such-column; No such column: year;
position: Map(row -> 1, column -> 238, line -> \"SELECT `pin`, `tax_year`, `class`, `township_code`,
`township_name`, `mailed_bldg`, `mailed_land`, `mailed_tot`, `certified_bldg`, `certified_land`, `certified_tot`,
`board_bldg`, `board_land`, `board_tot` |> SELECT COUNT(*) AS `COUNT`, `year` GROUP BY `year`\\n^\")"}

This PR adds a new optional exposure metadata field, meta.year_field, that allows us to configure the name of the year field to use for testing an open data asset. In the process, we update the metadata for the Assessed Values asset to set meta.year_field = "tax_year", and we update the script to parse this value and use it for asset queries.

Sample output (seems like the test for Parcel Proximity is failing due to missing 2023 data, which is unrelated to this PR):

$ python3 ../.github/scripts/test_open_data_assets.py target/manifest.json
Comparing asset 'Appeals' to model 'default.vw_pin_appeal'
Comparing asset 'Assessed Values' to model 'default.vw_pin_history'
Comparing asset 'Parcel Addresses' to model 'default.vw_pin_address'
Comparing asset 'Parcel Sales' to model 'default.vw_pin_sale'
Comparing asset 'Parcel Universe' to model 'default.vw_pin_universe'
Comparing asset 'Property Tax-Exempt Parcels' to model 'default.vw_pin_exempt'
Comparing asset 'Residential Condominium Unit Characteristics' to model 'default.vw_pin_condo_char'
Comparing asset 'Single and Multi-Family Improvement Characteristics' to model 'default.vw_card_res_char'
Skipping row count test for exposure `commercial_valuation_data` because it does not have an enabled `meta.test_row_count` attribute
Comparing asset 'Parcel Proximity' to model 'proximity.vw_pin10_proximity'

The following view/asset pairs had mismatching row counts by year:

|  year | proximity.vw_pin10_proximity | Parcel Proximity |
| ----- | ---------------------------- | ---------------- |
| 2,000 |                    1,391,640 |        1,391,633 |
| 2,001 |                    1,393,200 |        1,393,192 |
| 2,002 |                    1,395,432 |        1,395,427 |
| 2,003 |                    1,398,150 |        1,398,148 |
| 2,004 |                    1,401,578 |        1,401,574 |
| 2,005 |                    1,405,153 |        1,405,147 |
| 2,006 |                    1,408,873 |        1,408,870 |
| 2,007 |                    1,410,752 |        1,410,750 |
| 2,008 |                    1,413,379 |        1,413,374 |
| 2,009 |                    1,413,923 |        1,413,915 |
| 2,010 |                    1,413,896 |        1,413,888 |
| 2,011 |                    1,413,893 |        1,413,887 |
| 2,012 |                    1,413,934 |        1,413,929 |
| 2,013 |                    1,413,969 |        1,413,960 |
| 2,014 |                    1,414,003 |        1,413,995 |
| 2,015 |                    1,414,377 |        1,414,369 |
| 2,016 |                    1,414,637 |        1,414,634 |
| 2,017 |                    1,415,087 |        1,415,083 |
| 2,018 |                    1,414,933 |        1,414,929 |
| 2,019 |                    1,415,409 |        1,415,410 |
| 2,020 |                    1,415,881 |        1,415,880 |
| 2,021 |                    1,415,666 |        1,415,663 |
| 2,022 |                    1,416,601 |        1,416,600 |
| 2,023 |                    1,417,374 |                  |

Traceback (most recent call last):
  File "/home/jecochr/code/data-architecture/dbt/../.github/scripts/test_open_data_assets.py", line 240, in <module>
    main()
  File "/home/jecochr/code/data-architecture/dbt/../.github/scripts/test_open_data_assets.py", line 137, in main
    raise ValueError("Open data asset test failed")
ValueError: Open data asset test failed