NYCPlanning / data-engineering

Primary repository for NYC DCP's Data Engineering team
20 stars 0 forks source link

New gru-qaqc check - PAD BINs and BBLs #1033

Closed fvankrieken closed 4 weeks ago

fvankrieken commented 1 month ago

Awaiting final confirmation from RZ, but a relatively straightforward ask from GR

The output of this report should show Addresspoints whose underlying BIN and BBL does not match the BIN and/or BBL from Function 1A in Geosupport

My interpretation is that we should have a check were we

We recently added a new check, so there are two PRs which show the basic way to do this. For now, it's easiest to add boilerplate code to the db-gru-qaqc repo - that repo needs some attention and refactoring to align with our best practices but that should happen separately. The db-gru-qaqc repo is where logic for actually running these checks lives, and runs very much like our old workflows - action runs, spins up db, runs process in runner in that repo against temporary db, spits out files to s3.

The two PRs for the last new check are

sf-dcp commented 4 weeks ago

Update

Per discussion with GRU, The request for new report is slightly different:

They would actually like us to compare dcp_addresspoints vs pad for BIN diffs. Therefore, create a report by running dcp_addresspoints through geosupport and filtering geocoded records for BIN diffs. We already have a similar check for address-point rejects, so we will just add an additional output file when running it.