deployment-gap-model-education-fund / deployment-gap-model

ETL code for the Deployment Gap Model Education Fund
https://www.deploymentgap.fund/
MIT License
6 stars 2 forks source link

Add ballot ready to data warehouse #289

Closed bendnorman closed 10 months ago

bendnorman commented 11 months ago

This PR adds the ballot-ready data to the data warehouse and some information for the next election in each county.

bendnorman commented 11 months ago

Running into some weird errors:

  1. The ballot ready data has an election in Yakatat that is not present in the main counties_wide_format data frame.
  2. I'm getting this validation error because there are about a hundred new counties mostly from Puerto Rico. Not sure how the changes in the PR would have introduced counties:
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/app/dbcp/cli.py", line 95, in <module>
    sys.exit(main())
  File "/app/dbcp/cli.py", line 91, in main
    dbcp.data_mart.create_data_marts(args)
  File "/app/dbcp/data_mart/__init__.py", line 70, in create_data_marts
    validate_data_mart(engine=engine)
  File "/app/dbcp/validation/tests.py", line 245, in validate_data_mart
    test_county_long_vs_wide(engine)
  File "/app/dbcp/validation/tests.py", line 205, in test_county_long_vs_wide
    n_counties_wide == n_counties_long
AssertionError: counties_wide_format and counties_long_format have different county coverage
make: *** [all_local] Error 1
TrentonBush commented 10 months ago

That test checks that county_wide and county_long have consistent spatial coverage.

  1. Yakatat's FIPS code (02261) last exists in the 2010 FIPS vintage. The data mart currently uses 2020 vintage and drops anything from earlier/later (very few items, but that is one)
  2. The ballot ready stuff was only integrated into county_wide. I moved it to the _get_county_properties() function, which is used by both county_wide and county_long constructors (I also added the new columns to the county_long metadata). The test also relies on using that function to identify and drop county-level columns so that only spatial coverage of technical data is compared. Now the test passes.
bendnorman commented 10 months ago

Ok the gitbook has been updated, the data is in data_mart_dev and is updated with the August version of the raw data.

bendnorman commented 10 months ago

Yeah this could for sure be normalized into multiple tables. I didn't normalize to speed up integration. I'll create a data mart table with the normalized data. Once the data is in BQ, I'll go back an normalize the data warehouse table.

bendnorman commented 10 months ago

I made the requested changes: