COLP QAQC: Tickets - Githubissues

Oysters1874 commented 2 years ago

Proposed Works

Graphs -- App Side

Display number of records by agency/usetype.
Tasks:

[x] Colp Side: https://github.com/NYCPlanning/db-colp/issues/212
[x] App side: https://github.com/NYCPlanning/data-engineering-qaqc/issues/174

Version-to-version comparison -- App Side (and Maybe COLP Side)

We can display version-to-version changes in the number of records per use type. As the table already exists, we can only look at the app side.

[x] App side: https://github.com/NYCPlanning/data-engineering-qaqc/issues/175

We can follow the design of the CPDB page for this section.

Outlier Report -- App Side

With two existing qaqc tables, ipis_modified_hnums & ipis_modified_names, we can display the records with relevant fields with modified house numbers and parcel names

[x] App Side: https://github.com/NYCPlanning/data-engineering-qaqc/issues/176

Geospatial Check -- Both COLP and App Side

Check whether all properties are within NYC borough boundaries.

[x] COLP side: https://github.com/NYCPlanning/db-colp/issues/211
[x] App Side: https://github.com/NYCPlanning/data-engineering-qaqc/issues/177

Manual Corrections Check - App Side

We can display graphs and dataframe of Manual Corrections Applied and Not Applied by field, just like what PLUTO has done.

[x] App side: https://github.com/NYCPlanning/data-engineering-qaqc/issues/178

Current QAQC tables:

- Identifying invalid data in IPIS:

ipis_unmapped: unmappable input records
ipis_modified_hnums: records with modified house numbers
ipis_modified_names: records with modified parcel names
ipis_colp_geoerrors: addresses that return errors from 1B
ipis_sname_errors: addresses that return streetname errors from 1B
ipis_hnum_errors: addresses that return address errors from 1B
ipis_bbl_errors: records where address isn't valid for BBL
ipis_cd_errors: mismatch between IPIS community district and PLUTO

- Version-to-version comparison for COLP review:

usetype_changes: version-to-version changes in the number of records per use type

abrieff commented 2 years ago

Do you have a sense of what of this work should be done on the pipeline vs. app side?

Oysters1874 commented 2 years ago

Do you have a sense of what of this work should be done on the pipeline vs. app side?

Yea, I can mark that as well. But so far, I think all of these existing QAQC tables are uploaded to DO. For invalid records, mismatches, and version-to-version comparison, we can directly display them on the app side.

abrieff commented 2 years ago

👍

AmandaDoyle commented 2 years ago

The reports that you have outlined are very useful:

Existing QAQC reports
Displaying number of records by agency/usetype
Version-to-version changes in the number of records per use type and/or agency
Outlier report, geospatial check, and manual correction report that you have outlined

The following may not be so useful:

Adding comparisons between versions based on Ownership and Category

This is more general, but for COLP and other data products we like to check that all of the geospatial values are in sync, for example does the first number in the BIN, BBL, and CD match the boro code, perhaps there is a way to incorporate this type of check into COLP QAQC and think about how to design it so that it is easy to replicate across data products.

I'm happy to meet to talk anything though if helpful. Looking forward to seeing this

NYCPlanning / data-engineering-qaqc

COLP QAQC: Tickets #164