Closed brandonfcohen1 closed 3 months ago
Likely Brandon to do - TBD
@brandonfcohen1 is this a diff of the data in postgres or in the static tile file? How should I approach this? Thanks. You can assign to me.
Hey @zigouras- would definitely be easier in postgres
I did an audit of the tables in the postgres db to see if I could determine what the primary key columns should be. I need to enforce primary keys on any tables analyzed by the data-diff program. I would like to diff as many tables as possible.
The grid below lists the tables, my best guess at the pk columns, the number of rows with unique values in those columns, the total number of rows if different from the unique rows, any condition applied to get the row counts and whether it is included in the diff report currently.
Table | pk columns | Unique rows | Total rows | Condition | Included |
---|---|---|---|---|---|
drug_crimes | x, y, dispatch_date | 2596 | 2691 | ||
gun_crimes | x, y, dispatch_date | 5120 | 5136 | ||
imminently_dangerous_buildings | opa_account_num | 281 | 296 | ||
li_complaints | service_request_id | 246309 | Yes | ||
li_violations | violationnumber, opa_account_num | 95920 | Yes | ||
opa_properties | parcel_number | 582970 | Yes | ||
phs_properties | brt_id | 9044 | 9051 | ||
property_tax_delinquencies | opa_number | 54398 | opa_number <> 0 | Yes | |
rcos | OBJECTID | 251 | |||
spatial_ref_sys | srid | 8500 | |||
unsafe_buildings | opa_account_num | 3684 | 3791 | opa_account_num is not null | |
vacant_properties | opa_id, parcel_type | 35289 | opa_id is not null | Yes | |
vacant_properties_end | opa_id, parcel_type | 35289 | 35523 |
Some tables, like li_complaints, are simple and clean, others need to be cleaned up and still there are some discrepancies. @brandonfcohen1 can you take a look and comment on this metadata? Can I apply the cleanup steps, e.g. deleting the rows in property_tax_delinquencies where opa_number <> 0, after the data has been loaded in postgres, or do those rows have to be removed from the Geopandas object first so the map features do not get generated?
pr created, please review: https://github.com/CodeForPhilly/clean-and-green-philly/pull/632
Is your feature request related to a problem? Please describe. When the data is refreshed, we should track changes and make sure to archive old data.
Describe the solution you'd like Add to the data refresh the following: