NYCPlanning / db-facilities

🏭 🏢 🏬 🏣 🏤 🏥 🏦 🏨 🏪 🏫 🏩
https://nycplanning.github.io/db-facilities
0 stars 0 forks source link

2022 Q2 Facilites DB Dataloading #526

Closed SashaWeinstein closed 1 year ago

SashaWeinstein commented 2 years ago

FacDB Source Data Updates

Like most of our data products, source data must be updated in data library before FacDB is run. As there are are many source datasets with varied update processes, this issue template should be opened to track progress towards updating all source data

All source data is to be uploaded as .sql files

Scraped by data library

Source data from OpenData

To see if a dataset needs to be uploaded, check date last updated in open data against version in data library

Manually check data for updates

These don't report date updated as neatly as the open datasets, have to look at data itself

Manual download

Will receive via email or FTP

Unresolved process

Still waiting to figure out best way to upload these data

Last step

AmandaDoyle commented 2 years ago

hra_centers as a source was replaced with the following:

AmandaDoyle commented 2 years ago

doe_universalprek is still a data source and there are 1,597 records in FacDB where this is the datasource. SELECT * from facdb.facdb where datasource LIKE 'doe_universalprek'; The old path to get these data has been deprecated. Next step - can what's available on OpenData be used? Is it the same that's what currently in data libraries? What's different between the two datasets, if anything?

mbh329 commented 2 years ago

Ill take a look at the hra data and update it in data-library, run it through facdb

AmandaDoyle commented 2 years ago

doe_lcgms is the source data for 1841 records in FacDB. SELECT * from facdb.facdb where datasource LIKE 'doe_lcgms'; Here is what is in data libraries. Based on the yml we don't get this data from OpenData, but from here. Next step is to confirm that what you download from here is what we have in data libraries and if so update the data in libraries.

td928 commented 2 years ago

taken a look at doe_universalprek

some quick feedbacks. the Open Data version is pretty different from what we had last year. But in terms of how facdb uses this source it seems that the most crucial fields are still the available. The number of records in our last version is about 1500+ and the open data version is 1800+.

Another relevant enhancement from the Open Data verison is inclusion of lat long for each school. My thinking is we can turn this pairs into a wkb_geometry in the sql step which was null in the previous version. We can also switch from using 1B function to geocode with address to BIN or BBL which is also available on open data.

I can open up an issue and start working on this if no one already has a start. @AmandaDoyle @SashaWeinstein @mbh329

mbh329 commented 2 years ago

@td928 I have not started working on it, so feel free to work on the doe_universalprek