CodeForPhilly / clean-and-green-philly

Dashboard to target Philly vacant properties for anti-gun violence interventions
https://www.cleanandgreenphilly.org/
MIT License
32 stars 58 forks source link

Feature request: changelog for Data Refresh #520

Closed brandonfcohen1 closed 3 months ago

brandonfcohen1 commented 5 months ago

Is your feature request related to a problem? Please describe. When the data is refreshed, we should track changes and make sure to archive old data.

Describe the solution you'd like Add to the data refresh the following:

Moylena commented 5 months ago

Likely Brandon to do - TBD

zigouras commented 4 months ago

@brandonfcohen1 is this a diff of the data in postgres or in the static tile file? How should I approach this? Thanks. You can assign to me.

brandonfcohen1 commented 4 months ago

Hey @zigouras- would definitely be easier in postgres

zigouras commented 4 months ago

I did an audit of the tables in the postgres db to see if I could determine what the primary key columns should be. I need to enforce primary keys on any tables analyzed by the data-diff program. I would like to diff as many tables as possible.

The grid below lists the tables, my best guess at the pk columns, the number of rows with unique values in those columns, the total number of rows if different from the unique rows, any condition applied to get the row counts and whether it is included in the diff report currently.

Table pk columns Unique rows Total rows Condition Included
drug_crimes x, y, dispatch_date 2596 2691
gun_crimes x, y, dispatch_date 5120 5136
imminently_dangerous_buildings opa_account_num 281 296
li_complaints service_request_id 246309 Yes
li_violations violationnumber, opa_account_num 95920 Yes
opa_properties parcel_number 582970 Yes
phs_properties brt_id 9044 9051
property_tax_delinquencies opa_number 54398 opa_number <> 0 Yes
rcos OBJECTID 251
spatial_ref_sys srid 8500
unsafe_buildings opa_account_num 3684 3791 opa_account_num is not null
vacant_properties opa_id, parcel_type 35289 opa_id is not null Yes
vacant_properties_end opa_id, parcel_type 35289 35523

Some tables, like li_complaints, are simple and clean, others need to be cleaned up and still there are some discrepancies. @brandonfcohen1 can you take a look and comment on this metadata? Can I apply the cleanup steps, e.g. deleting the rows in property_tax_delinquencies where opa_number <> 0, after the data has been loaded in postgres, or do those rows have to be removed from the Geopandas object first so the map features do not get generated?

zigouras commented 4 months ago

pr created, please review: https://github.com/CodeForPhilly/clean-and-green-philly/pull/632