NYCPlanning / db-developments

🏠 🏘️ 🏗️ Developments Database
https://nycplanning.github.io/db-developments
8 stars 2 forks source link

Use google big query for source data that gets updated weekly #578

Open SashaWeinstein opened 2 years ago

SashaWeinstein commented 2 years ago

The way the data sync CRON job currently works is that it writes all records to a new folder. This means a lot of redundancy, as an entire copy of the data is uploaded with only a handful of new records (relatively speaking). I think that google big query would be a good alternative, makes it easy to add new records to an existing table.

SashaWeinstein commented 2 years ago

As I develop more on devDB the shortcoming of the existing process become more clear to me. After I added the A2 jobs to the 20220918 folder for issue #549, subsequent scheduled runs of the data sync job will send version of the dataset without A2 jobs to latest. This is the reason that filters in the data library template are too far upstream. We need a better process here