Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.
3
stars
0
forks
source link
How to deal with historic updates to NYC Open Data? #1
It appears that the vehicle collision data on the NYC Open Data Portal is edited after it is published. For example, previously published rows may be removed and values for columns in other rows may be updated, even for rows that date back over a year from the current date.
How do we know when previously published data is updated?
What is the best way to keep the table in CARTO in sync with the updated data?
Updates aren't as big of a deal as an UPSERT query can be used
Removing rows gets trickier, we can use the unique_key from Socrata, but will need to get values for every row in Socrata each time the database is updated. Could then do a DELETE with a WHERE unique_key NOT IN socrata_id.
What if a mistake is made by someone at Socrata (such as incorrect rows being deleted) and it should be avoided synced with the table at CARTO?
It appears that the vehicle collision data on the NYC Open Data Portal is edited after it is published. For example, previously published rows may be removed and values for columns in other rows may be updated, even for rows that date back over a year from the current date.
How do we know when previously published data is updated?
What is the best way to keep the table in CARTO in sync with the updated data?
Updates aren't as big of a deal as an
UPSERT
query can be usedRemoving rows gets trickier, we can use the
unique_key
from Socrata, but will need to get values for every row in Socrata each time the database is updated. Could then do a DELETE with a WHEREunique_key
NOT INsocrata_id
.What if a mistake is made by someone at Socrata (such as incorrect rows being deleted) and it should be avoided synced with the table at CARTO?