NYCPlanning / db-developments

🏠 🏘️ 🏗️ Developments Database
https://nycplanning.github.io/db-developments
8 stars 2 forks source link

Geocoding results from data sync have two library archive functions #506

Open td928 opened 2 years ago

td928 commented 2 years ago

data_sync.yml relies on two functions from devdb.sh library_archive and library_archive_version to manage the geocoded results for HNY and DOB. This seems confusing at best and at worst can make data ingestion and keep track of source data versioning problematic because the main distinction of those two functions are how they are getting the versioning for the different datasets. One is taking the version date from open data with get_version function and the other is taking the github workflow input from user.

td928 commented 1 year ago

To further explain the confusion with library_archive_version, the use of ${{ github.event.inputs.version }} somehow does not align with the date github action date. It is somehow get saved in DO with date earlier than the input date.

td928 commented 1 year ago

Also it might be good to give a larger overhaul to the data sync process a thought because currently if the workflow dispatch is given a specific date. The update for the dob_geocoded_results would not be updating the data for that specific date which the the process seems to suggest. It would take the most recent dob_jobapplication and dob_permittance data, geocoding them, and send them to the workflow dispatch date version folder on DO. So one improvement here is to separate the data sync action (which just pull from open data and update DO) and geocoding step (which pull from DO and geocode the DOB data) will make the process more manageable imo.

SashaWeinstein commented 1 year ago

Is this issue open and ready to be addressed? My time working on the publishing workflow might help me figure it out